CTDB Remix I: Dreaming the Fantasy Amitay Isaacs amitay@samba.org - - PowerPoint PPT Presentation

ctdb remix
SMART_READER_LITE
LIVE PREVIEW

CTDB Remix I: Dreaming the Fantasy Amitay Isaacs amitay@samba.org - - PowerPoint PPT Presentation

CTDB Remix I: Dreaming the Fantasy Amitay Isaacs amitay@samba.org Samba Team IBM (Australia Development Labs, Linux Technology Center) SambaXP 2017 Amitay Isaacs CTDB Remix - Dreaming the Fantasy CTDB Project Motivation: Support for


slide-1
SLIDE 1

CTDB Remix

I: Dreaming the Fantasy Amitay Isaacs amitay@samba.org

Samba Team IBM (Australia Development Labs, Linux Technology Center)

SambaXP 2017

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-2
SLIDE 2

CTDB Project

Motivation: Support for clustered Samba Multiple nodes active simultaneously Communication between nodes (heartbeat, failover) Distributed databases between nodes Features: Volatile and Persistent databases Cluster-side messaging for Samba IP failover and load balancing Service monitoring Community: ctdb.samba.org git.samba.org/samba.git wiki.samba.org/index.php/CTDB and Clustered Samba

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-3
SLIDE 3

Overview

Dreaming the Fantasy

How did we get here? Evolving the design Laying the foundations New Architecture

Designing the Reality

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-4
SLIDE 4

How did we get here?

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-5
SLIDE 5

2013 Recap

SambaXP 2013

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-6
SLIDE 6

2013 Recap

SambaXP 2013 Introduced lock helper, event helper

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-7
SLIDE 7

2014 Recap

CTDB merges with Samba CTDB merged into Samba tree (Nov 2013) CTDB standalone waf build (Jun 2014) CTDB build integrated in toplevel build (Nov 2014)

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-8
SLIDE 8

2015 Recap

Parallel Database Recovery Protocol marshalling New abstractions New Communication framework (tevent req based async) New Client Code Database recovery helper Re-implemented ctdb tool using new client API

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-9
SLIDE 9

2015 Recap

Parallel Database Recovery Protocol marshalling New abstractions New Communication framework (tevent req based async) New Client Code Database recovery helper Re-implemented ctdb tool using new client API Introduced natgw helper

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-10
SLIDE 10

2016 Recap

Introduced killtcp helper, lvs helper

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-11
SLIDE 11

2016 Recap

Introduced killtcp helper, lvs helper Event daemon New abstractions - run proc, sock daemon Event Protocol Event client code Event handling daemon

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-12
SLIDE 12

Evolving the design

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-13
SLIDE 13

Evolving the design

Identifying CTDB functions

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-14
SLIDE 14

Evolving the design

Identifying CTDB functions

recovery daemon ctdb daemon

Eventscripts Monitor Tunables Attach/Detach Freeze/Thaw Vacuuming Locking Unix Socket Transports Controls Messages Calls Disable/Enable Healthy/Unhealthy Ban/Unban ReleaseIP TakeIP IPreallocated Traverse Migration Transaction Recovery Recovery Lock Election Consistency IP failover IP allocation Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-15
SLIDE 15

Evolving the design

Grouping CTDB functions

recovery daemon ctdb daemon Protocol

Attach/Detach Freeze/Thaw Traverse Migrations Transactions Vacuuming Locking Election Recovery Consistency Banning TakeIP/ReleaseIP/IPreallocated IP assignment Disable/Enable Eventscripts Healthy/Unhealthy Start/Stop Monitor Calls Controls Messages Unix Socket Transports Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-16
SLIDE 16

Evolving the design

New subsystems

recovery daemon ctdb daemon Clustered database Cluster manager IP failover Service manager Protocol Protocol

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-17
SLIDE 17

Evolving the design

Redesign of server code

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-18
SLIDE 18

Evolving the design

Redesign of server code First approach

Main concern is the transport

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-19
SLIDE 19

Evolving the design

Redesign of server code First approach

Main concern is the transport

Motivation Avoid m*n*n connections

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-20
SLIDE 20

Evolving the design

Redesign of server code First approach

Main concern is the transport Develop parallel transport and proxies (scalability)

Motivation Avoid m*n*n connections

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-21
SLIDE 21

Evolving the design

Redesign of server code First approach

Main concern is the transport Develop parallel transport and proxies (scalability) Convert CTDB transport to use proxy

Motivation Avoid m*n*n connections

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-22
SLIDE 22

Evolving the design

Redesign of server code First approach

Main concern is the transport Develop parallel transport and proxies (scalability) Convert CTDB transport to use proxy

Motivation Avoid m*n*n connections Unix datagrams (SambaXP 2015) tmsgd - fd passing (SambaXP 2016)

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-23
SLIDE 23

Evolving the design

Redesign of server code First approach

Main concern is the transport Develop parallel transport and proxies (scalability) Convert CTDB transport to use proxy

Motivation Avoid m*n*n connections Unix datagrams (SambaXP 2015) tmsgd - fd passing (SambaXP 2016) Proxy design never took off

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-24
SLIDE 24

Evolving the design

Redesign of server code

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-25
SLIDE 25

Evolving the design

Redesign of server code Second approach

Split code out and create separate daemons

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-26
SLIDE 26

Evolving the design

Redesign of server code Second approach

Split code out and create separate daemons Avoid boilerplate, most daemons to use unix domain sockets sock daemon abstraction

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-27
SLIDE 27

Evolving the design

Redesign of server code Second approach

Split code out and create separate daemons Avoid boilerplate, most daemons to use unix domain sockets sock daemon abstraction First candidate — event daemon

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-28
SLIDE 28

Evolving the design

Redesign of server code Second approach

Split code out and create separate daemons Avoid boilerplate, most daemons to use unix domain sockets sock daemon abstraction First candidate — event daemon

eventd

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-29
SLIDE 29

Evolving the design

Redesign of server code Second approach

Split code out and create separate daemons Avoid boilerplate, most daemons to use unix domain sockets sock daemon abstraction First candidate — event daemon

eventd Does not need any CTDB infrastructure

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-30
SLIDE 30

Evolving the design

Redesign of server code Second approach

Split code out and create separate daemons Avoid boilerplate, most daemons to use unix domain sockets sock daemon abstraction First candidate — event daemon

eventd Does not need any CTDB infrastructure run proc abstraction

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-31
SLIDE 31

Evolving the design

Redesign of server code Second approach

Split code out and create separate daemons Avoid boilerplate, most daemons to use unix domain sockets sock daemon abstraction First candidate — event daemon

eventd Does not need any CTDB infrastructure run proc abstraction New protocol - request, reply

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-32
SLIDE 32

Evolving the design

Redesign of server code Second approach

Split code out and create separate daemons Avoid boilerplate, most daemons to use unix domain sockets sock daemon abstraction First candidate — event daemon

eventd Does not need any CTDB infrastructure run proc abstraction New protocol - request, reply Easy testing

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-33
SLIDE 33

Evolving the design

Redesign of server code Second approach

Split code out and create separate daemons Avoid boilerplate, most daemons to use unix domain sockets sock daemon abstraction First candidate — event daemon

eventd Does not need any CTDB infrastructure run proc abstraction New protocol - request, reply Easy testing run event abstraction

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-34
SLIDE 34

Laying the foundations

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-35
SLIDE 35

Laying the foundations

CTDB State management Solved differently for different things

tickles - Protocol to sync tickle lists nfs locks - Using persistent database

Persistent databases are slow We are in the business of clustered databases New database model?

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-36
SLIDE 36

Laying the foundations

CTDB State management Solved differently for different things

tickles - Protocol to sync tickle lists nfs locks - Using persistent database

Persistent databases are slow We are in the business of clustered databases New database model? Replicated database State information needed during lifetime of CTDB Volatile (CLEAR IF FIRST) Replicated (Re-use existing API) Uses g lock and transactions

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-37
SLIDE 37

Laying the foundations

Database Performance 2 nodes (Intel Xeon E5620, RHEL6), 30 second test

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-38
SLIDE 38

Laying the foundations

Database Performance 2 nodes (Intel Xeon E5620, RHEL6), 30 second test g lock test persistent replicated

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-39
SLIDE 39

Laying the foundations

Database Performance 2 nodes (Intel Xeon E5620, RHEL6), 30 second test g lock test

  • n disk

tmpfs fcntl 5718 5750 persistent replicated

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-40
SLIDE 40

Laying the foundations

Database Performance 2 nodes (Intel Xeon E5620, RHEL6), 30 second test g lock test

  • n disk

tmpfs fcntl 5718 5750 mutexes 7573 7893 persistent replicated

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-41
SLIDE 41

Laying the foundations

Database Performance 2 nodes (Intel Xeon E5620, RHEL6), 30 second test g lock test

  • n disk

tmpfs fcntl 5718 5750 mutexes 7573 7893 persistent

  • n disk

tmpfs fcntl 11 11 replicated

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-42
SLIDE 42

Laying the foundations

Database Performance 2 nodes (Intel Xeon E5620, RHEL6), 30 second test g lock test

  • n disk

tmpfs fcntl 5718 5750 mutexes 7573 7893 persistent

  • n disk

tmpfs fcntl 11 11 mutexes 11 11 replicated

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-43
SLIDE 43

Laying the foundations

Database Performance 2 nodes (Intel Xeon E5620, RHEL6), 30 second test g lock test

  • n disk

tmpfs fcntl 5718 5750 mutexes 7573 7893 persistent

  • n disk

tmpfs fcntl 11 11 mutexes 11 11 replicated

  • n disk

tmpfs fcntl 583 619

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-44
SLIDE 44

Laying the foundations

Database Performance 2 nodes (Intel Xeon E5620, RHEL6), 30 second test g lock test

  • n disk

tmpfs fcntl 5718 5750 mutexes 7573 7893 persistent

  • n disk

tmpfs fcntl 11 11 mutexes 11 11 replicated

  • n disk

tmpfs fcntl 583 619 mutexes 80 89

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-45
SLIDE 45

Laying the foundations

Database Performance 2 nodes (Intel Xeon E5620, RHEL6), 30 second test g lock test

  • n disk

tmpfs fcntl 5718 5750 mutexes 7573 7893 persistent

  • n disk

tmpfs fcntl 11 11 mutexes 11 11 replicated

  • n disk

tmpfs fcntl 583 619 mutexes 80 89 Why are transactions with mutexes so slow?

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-46
SLIDE 46

Laying the foundations

Node-to-node communication

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-47
SLIDE 47

Laying the foundations

Node-to-node communication CTDB REQ MESSAGE

Can carry abritrary data Used by recovery daemon, samba SRVID based

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-48
SLIDE 48

Laying the foundations

Node-to-node communication CTDB REQ MESSAGE

Can carry abritrary data Used by recovery daemon, samba SRVID based

Several issues

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-49
SLIDE 49

Laying the foundations

Node-to-node communication CTDB REQ MESSAGE

Can carry abritrary data Used by recovery daemon, samba SRVID based

Several issues

Multiple processes can get same message

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-50
SLIDE 50

Laying the foundations

Node-to-node communication CTDB REQ MESSAGE

Can carry abritrary data Used by recovery daemon, samba SRVID based

Several issues

Multiple processes can get same message Fire and forget

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-51
SLIDE 51

Laying the foundations

Node-to-node communication CTDB REQ MESSAGE

Can carry abritrary data Used by recovery daemon, samba SRVID based

Several issues

Multiple processes can get same message Fire and forget

Tunnels

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-52
SLIDE 52

Laying the foundations

Node-to-node communication CTDB REQ MESSAGE

Can carry abritrary data Used by recovery daemon, samba SRVID based

Several issues

Multiple processes can get same message Fire and forget

Tunnels New packet type CTDB REQ TUNNEL

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-53
SLIDE 53

Laying the foundations

Node-to-node communication CTDB REQ MESSAGE

Can carry abritrary data Used by recovery daemon, samba SRVID based

Several issues

Multiple processes can get same message Fire and forget

Tunnels New packet type CTDB REQ TUNNEL Uses existing CTDB transport

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-54
SLIDE 54

Laying the foundations

Node-to-node communication CTDB REQ MESSAGE

Can carry abritrary data Used by recovery daemon, samba SRVID based

Several issues

Multiple processes can get same message Fire and forget

Tunnels New packet type CTDB REQ TUNNEL Uses existing CTDB transport Register tunnels with tunnel id (new controls)

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-55
SLIDE 55

Laying the foundations

Node-to-node communication CTDB REQ MESSAGE

Can carry abritrary data Used by recovery daemon, samba SRVID based

Several issues

Multiple processes can get same message Fire and forget

Tunnels New packet type CTDB REQ TUNNEL Uses existing CTDB transport Register tunnels with tunnel id (new controls) Client API to encapsulate packets

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-56
SLIDE 56

Laying the foundations

Node-to-node communication CTDB REQ MESSAGE

Can carry abritrary data Used by recovery daemon, samba SRVID based

Several issues

Multiple processes can get same message Fire and forget

Tunnels New packet type CTDB REQ TUNNEL Uses existing CTDB transport Register tunnels with tunnel id (new controls) Client API to encapsulate packets New daemons can use new protocol

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-57
SLIDE 57

Laying the foundations

Rethinking subsystem design

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-58
SLIDE 58

Laying the foundations

Rethinking subsystem design Loose coupling to CTDB

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-59
SLIDE 59

Laying the foundations

Rethinking subsystem design Loose coupling to CTDB

Notifications (subsystem →)

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-60
SLIDE 60

Laying the foundations

Rethinking subsystem design Loose coupling to CTDB

Notifications (subsystem →) Actions (subsystem ←)

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-61
SLIDE 61

Laying the foundations

Rethinking subsystem design Loose coupling to CTDB

Notifications (subsystem →) Actions (subsystem ←)

State transition graphs

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-62
SLIDE 62

Laying the foundations

Rethinking subsystem design Loose coupling to CTDB

Notifications (subsystem →) Actions (subsystem ←)

State transition graphs Candidates

Cluster Manager Service Monitoring IP Failover

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-63
SLIDE 63

New Architecture

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-64
SLIDE 64

Cluster Manager

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-65
SLIDE 65

Cluster Manager

Split cluster manager code

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-66
SLIDE 66

Cluster Manager

Split cluster manager code

Node 0 Node 1 Node 2 CTDB Cluster Manager CTDB Cluster Manager CTDB Cluster Manager Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-67
SLIDE 67

Cluster Manager

Split cluster manager code Keep it loosely coupled

Node 0 Node 1 Node 2 CTDB Cluster Manager CTDB Cluster Manager CTDB Cluster Manager Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-68
SLIDE 68

Cluster Manager

Split cluster manager code Keep it loosely coupled

Node 1 Glue Node 2 Glue Node 0 Glue CTDB Cluster Manager CTDB Cluster Manager CTDB Cluster Manager Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-69
SLIDE 69

Cluster Manager

Split cluster manager code Keep it loosely coupled Support 3rd party replacements (e.g. etcd)

Node 1 Glue Node 2 Glue Node 0 Glue CTDB Cluster Manager CTDB Cluster Manager CTDB Cluster Manager Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-70
SLIDE 70

Cluster Manager

Split cluster manager code Keep it loosely coupled Support 3rd party replacements (e.g. etcd)

Node 1 Glue Node 2 Glue Node 0 Glue CTDB Cluster Manager CTDB Cluster Manager CTDB Cluster Manager Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-71
SLIDE 71

Service Manager and IP Failover

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-72
SLIDE 72

Service Manager and IP Failover

Separate service manager code (eventd + . . . )

Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-73
SLIDE 73

Service Manager and IP Failover

Separate service manager code (eventd + . . . )

Node 1 Node 2 Node 0 CTDB Manager Service CTDB Manager Service CTDB Manager Service Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-74
SLIDE 74

Service Manager and IP Failover

Separate service manager code (eventd + . . . ) Separate IP failover code

Node 1 Node 2 Node 0 CTDB Manager Service CTDB Manager Service CTDB Manager Service Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-75
SLIDE 75

Service Manager and IP Failover

Separate service manager code (eventd + . . . ) Separate IP failover code

Node 1 Node 2 Node 0 CTDB Manager Service CTDB Manager Service CTDB Manager Service IP Failover IP Failover IP Failover Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-76
SLIDE 76

Service Manager and IP Failover

Separate service manager code (eventd + . . . ) Separate IP failover code IP failover as a service

Node 1 Node 2 Node 0 CTDB Manager Service CTDB Manager Service CTDB Manager Service IP Failover IP Failover IP Failover Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-77
SLIDE 77

Service Manager and IP Failover

Separate service manager code (eventd + . . . ) Separate IP failover code IP failover as a service

Node 0 Node 1 Node 2 CTDB Manager Service IP Failover CTDB Manager Service CTDB Manager Service IP Failover IP Failover Amitay Isaacs CTDB Remix - Dreaming the Fantasy

slide-78
SLIDE 78

Questions / Comments

Amitay Isaacs CTDB Remix - Dreaming the Fantasy