Orchestrator on Ra fu : internals, benefits and considerations - PowerPoint PPT Presentation

Orchestrator on Ra fu : internals, benefits and considerations Shlomi Noach GitHub PerconaLive 2018

About me @github/database-infrastructure Author of orchestrator , gh-ost , freno , ccql and others. Blog at http://openark.org @ShlomiNoach

Agenda Raft overview Why orchestrator/raft orchestrator/raft implementation and nuances HA, fencing Service discovery Considerations

Ra fu Consensus algorithm � Quorum based In-order replication log � � Delivery, lag Snapshots � �

HashiCorp ra fu golang raft implementation Used by Consul Recently hit 1.0.0 github.com/hashicorp/raft

orchestrator � � � MySQL high availability solution and � replication topology manager � Developed at GitHub � � � � Apache 2 license � � github.com/github/orchestrator � � � � � �

Why orchestrator/ra fu Remove MySQL backend dependency DC fencing And then good things happened that were not planned: Better cross-DC deployments DC-local KV control Kubernetes friendly

orchestrator/ra fu � � � n orchestrator nodes form a raft cluster � Each node has its own,dedicated backend � database (MySQL or SQLite) � � � � All nodes probe the topologies � � All nodes run failure detection � � Only the leader runs failure recoveries � � � �

Implementation & deployment @ GitHub 5 Nodes (2xDC1, 2xDC2, 1xDc3) � 2xDC1 � 1 second raft polling interval step-down � 2xDC2 � raft-yield SQLite-backed log store � DC3 � MySQL backend (SQLite backend use case in the works)

A high availability scenario o2 is leader of a 3-node orchestrator/raft setup � o1 � � � � o2 � � � � � � o3 �

Injecting failure master: killall -9 mysqld o2 detects failure. About to recover, but… � o1 � � � � o2 � � � � � � o3 �

Injecting 2nd failure o2: DROP DATABASE orchestrator; o2 freaks out. 5 seconds later it steps � down o1 � � � � o2 � � � � � � o3 �

orchestrator recovery o1 grabs leadership � o1 � � � � o2 � � � � � � o3 �

MySQL recovery o1 detected failure even before stepping up as leader. � � o1 , now leader, kicks recovery, fails over o1 � MySQL master � � � o2 � � � � o3 �

orchestrator self health tests Meanwhile, o2 panics and bails out. � � o1 � � � � o2 � � � � o3 �

puppet Some time later, puppet kicks orchestrator service back on o2 . � � o1 � � � � o2 � � � � o3 �

orchestrator startup orchestrator service on o2 bootstraps, creates orchestrator schema and � tables. � o1 � � � � o2 � � � � o3 �

Joining ra fu cluster o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the � group � o1 � � � � o2 � � � � o3 �

Grabbing leadership Some time later, o2 grabs leadership � � o1 � � � � o2 � � � � o3 �

DC fencing Assume this 3 DC setup One orchestrator node in each DC � DC1 � Master and a few replicas in DC2 � � � DC2 � � � What happens if DC2 gets network partitioned? � � � DC3 � i.e. no network in or out DC2

DC fencing From the point of view of DC2 servers, and in particular in the point of view of DC2 ’s orchestrator node: � DC1 � Master and replicas are fine. � � � DC2 � � � DC1 and DC3 servers are all dead. � � � DC3 No need for fail over. � However, DC2 ’s orchestrator is not part of a quorum, hence not the leader. It doesn’t call the shots.

DC fencing In the eyes of either DC1 ’s or DC3 ’s orchestrator: � All DC2 servers, including the master, are DC1 � dead. � � � DC2 � � � There is need for failover. � � � DC3 DC1 ’s and DC3 ’s orchestrator nodes form a � quorum. One of them will become the leader. The leader will initiate failover.

DC fencing Depicted potential failover result. New � � � master is from DC3 . � � DC1 � DC2 � � � DC3 � � �

orchestrator/ra fu & consul orchestrator is Consul-aware Upon failover orchestrator updates Consul KV with identity of promoted master Consul @ GitHub is DC-local, no replication between Consul setups orchestrator nodes, update Consul locally on each DC

Considerations, watch out for Eventual consistency is not always your best friend What happens if, upon replay of raft log, you hit two failovers for the same cluster? NOW() and otherwise time-based assumptions Reapplying snapshot/log upon startup

orchestrator/ra fu roadmap Kubernetes ClusterIP-based configuration in progress Already container-friendly via auto-reprovisioning of nodes via Raft

Thank you! Questions? github.com/shlomi-noach @ShlomiNoach

Orchestrator on Ra fu : internals, benefits and considerations - PowerPoint PPT Presentation

Orchestrator on Ra fu : internals, benefits and considerations Shlomi Noach GitHub PerconaLive 2018 About me @github/database-infrastructure Author of orchestrator , gh-ost , freno , ccql and others. Blog at http://openark.org @ShlomiNoach

Windows 8 Heap Internals Windows 8 Heap Internals Windows 8 Heap Internals INTRODUCTION Windows 8

Mining Minds Platform Open Source Presentation Service Orchestrator is released as a part of

Orchestrator High Availability tutorial Shlomi Noach GitHub PerconaLive 2018 About me

Practical Orchestrator Shlomi Noach GitHub Percona Live Europe 2017 How people build so

Roadmap for Section 4.3. Windows Process and Thread Internals Thread Block, Process Block Flow

Secret Debian Internals Enrico Zini enrico@debian.org 25 February 2007 Enrico Zini

QEMU internals Chad D. Kersey January 28, 2009 Chad D. Kersey QEMU internals The basics

Health & Welfare Employee Benefits 2015 Benefits Overview AGENDA Benefits Team

Service Manager and Orchestrator Service Manager Improvements to support for Supports

Automatic failovers with Kubernetes using Orchestrator, ProxySQL and Zookeeper Continuous

FogIoT Orchestrator: an Orchestration System for IoT Applications in Fog Environment Bruno

Orchestrator: A post-mortem on an automated MMO testing framework David Press

OPEN-O Unified NFV/SDN Open Source Orchestrator Hui Deng, China Mobile Kai Liu, China Telecom

Chrome OS Internals Josh Triplett josh@joshtriplett.org LinuxCon Europe 2014 Josh Triplett

Ltac Internals Pierre-Marie Pdrot INRIA Coq Implementor Workshop . . . . . . . . . .

Secret Debian Internals Enrico Zini enrico@debian.org 25 February 2007 Enrico Zini

Generalized Consensus and Paxos Lamport, Leslie 2005 Problem

Flexible Paxos: Quorum Intersection Revisited Wen-Chien Wang Review Paxos Prepare Promise

I Second That Motion Local Government Board Procedures Trey Allen 2017 Clerks Certification

Dynamo Amazons Highly Available Key-value Store SOSP 07 Authors Giuseppe DeCandia, Deniz

When Raft Meets SDN: How to Elect a Leader and Reach Consensus in an Unruly Network Yang Zhang ,

1.1 Problem Description and System Model at multiple no des acting as r ep ositories,

Consistency in key- value stores Monika Moser Consistency guarantees explain a system's

Distributed Mutual Exclusion Algorithms Course: Distributed Computing Faculty: Dr. Rajendra

Orchestrator on Ra fu : internals, benefits and considerations - PowerPoint PPT Presentation

Orchestrator on Ra fu : internals, benefits and considerations Shlomi Noach GitHub PerconaLive 2018 About me @github/database-infrastructure Author of orchestrator , gh-ost , freno , ccql and others. Blog at http://openark.org @ShlomiNoach

Windows 8 Heap Internals Windows 8 Heap Internals Windows 8 Heap Internals INTRODUCTION Windows 8

Mining Minds Platform Open Source Presentation Service Orchestrator is released as a part of

Orchestrator High Availability tutorial Shlomi Noach GitHub PerconaLive 2018 About me

Practical Orchestrator Shlomi Noach GitHub Percona Live Europe 2017 How people build so

Roadmap for Section 4.3. Windows Process and Thread Internals Thread Block, Process Block Flow

Secret Debian Internals Enrico Zini enrico@debian.org 25 February 2007 Enrico Zini

QEMU internals Chad D. Kersey January 28, 2009 Chad D. Kersey QEMU internals The basics

Health &amp; Welfare Employee Benefits 2015 Benefits Overview AGENDA Benefits Team

Service Manager and Orchestrator Service Manager Improvements to support for Supports

Automatic failovers with Kubernetes using Orchestrator, ProxySQL and Zookeeper Continuous

FogIoT Orchestrator: an Orchestration System for IoT Applications in Fog Environment Bruno

Orchestrator: A post-mortem on an automated MMO testing framework David Press

OPEN-O Unified NFV/SDN Open Source Orchestrator Hui Deng, China Mobile Kai Liu, China Telecom

Chrome OS Internals Josh Triplett josh@joshtriplett.org LinuxCon Europe 2014 Josh Triplett

Ltac Internals Pierre-Marie Pdrot INRIA Coq Implementor Workshop . . . . . . . . . .

Secret Debian Internals Enrico Zini enrico@debian.org 25 February 2007 Enrico Zini

Generalized Consensus and Paxos Lamport, Leslie 2005 Problem

Flexible Paxos: Quorum Intersection Revisited Wen-Chien Wang Review Paxos Prepare Promise

I Second That Motion Local Government Board Procedures Trey Allen 2017 Clerks Certification

Dynamo Amazons Highly Available Key-value Store SOSP 07 Authors Giuseppe DeCandia, Deniz

When Raft Meets SDN: How to Elect a Leader and Reach Consensus in an Unruly Network Yang Zhang ,

1.1 Problem Description and System Model at multiple no des acting as r ep ositories,

Consistency in key- value stores Monika Moser Consistency guarantees explain a system's

Distributed Mutual Exclusion Algorithms Course: Distributed Computing Faculty: Dr. Rajendra

Health & Welfare Employee Benefits 2015 Benefits Overview AGENDA Benefits Team