CTDB Stories Amitay Isaacs amitay@samba.org Samba Team IBM - - PowerPoint PPT Presentation

ctdb stories
SMART_READER_LITE
LIVE PREVIEW

CTDB Stories Amitay Isaacs amitay@samba.org Samba Team IBM - - PowerPoint PPT Presentation

CTDB Stories Amitay Isaacs amitay@samba.org Samba Team IBM (Australia Development Labs, Linux Technology Center) Amitay Isaacs CTDB Stories CTDB Project Motivation: Support for clustered Samba Multiple nodes active simultaneously


slide-1
SLIDE 1

CTDB Stories

Amitay Isaacs amitay@samba.org

Samba Team IBM (Australia Development Labs, Linux Technology Center)

Amitay Isaacs CTDB Stories

slide-2
SLIDE 2

CTDB Project

Motivation: Support for clustered Samba Multiple nodes active simultaneously Communication between nodes (heartbeat, failover) Share databases between nodes Features: Volatile and Persistent databases IP failover and load balancing Service monitoring Community: http://ctdb.samba.org git://git.samba.org/ctdb.git, git://git.samba.org/samba.git

Amitay Isaacs CTDB Stories

slide-3
SLIDE 3

Headlines

Merging CTDB tree in Samba tree Development Stories

High hopcount bug Getting lock scheduling right All nodes banned on single node failure

Regression Stories

Real time or not Fixing compiler warnings

Amitay Isaacs CTDB Stories

slide-4
SLIDE 4

Story of the Merge

Amitay Isaacs CTDB Stories

slide-5
SLIDE 5

Story of the Merge

SambaXP 2013 Merge CTDB in Samba tree?

Remove duplication of talloc, tdb, tevent, replace libraries Autobuild testing of clustered Samba Leverage off Samba release process

Amitay Isaacs CTDB Stories

slide-6
SLIDE 6

Story of the Merge

SambaXP 2013 Merge CTDB in Samba tree?

Remove duplication of talloc, tdb, tevent, replace libraries Autobuild testing of clustered Samba Leverage off Samba release process Attract more developers

Amitay Isaacs CTDB Stories

slide-7
SLIDE 7

Story of the Merge

SambaXP 2013 Merge CTDB in Samba tree?

Remove duplication of talloc, tdb, tevent, replace libraries Autobuild testing of clustered Samba Leverage off Samba release process Attract more developers

Nov 2013 CTDB tree merged with Samba

Amitay Isaacs CTDB Stories

slide-8
SLIDE 8

Story of the Merge

SambaXP 2013 Merge CTDB in Samba tree?

Remove duplication of talloc, tdb, tevent, replace libraries Autobuild testing of clustered Samba Leverage off Samba release process Attract more developers

Nov 2013 CTDB tree merged with Samba SambaXP 2014 To Do

Create waf build for CTDB, Clustered Samba Setting up clustered samba instance for autobuild Split monolithic code

Amitay Isaacs CTDB Stories

slide-9
SLIDE 9

Story of the Merge

Amitay Isaacs CTDB Stories

slide-10
SLIDE 10

Story of the Merge

Step 1 Convert CTDB autoconf build to waf build

Amitay Isaacs CTDB Stories

slide-11
SLIDE 11

Story of the Merge

Step 1 Convert CTDB autoconf build to waf build

Finished implementation before reaching Australia

Amitay Isaacs CTDB Stories

slide-12
SLIDE 12

Story of the Merge

Step 1 Convert CTDB autoconf build to waf build

Finished implementation before reaching Australia

Step 2 Integrate CTDB build into toplevel build

Amitay Isaacs CTDB Stories

slide-13
SLIDE 13

Story of the Merge

Step 1 Convert CTDB autoconf build to waf build

Finished implementation before reaching Australia

Step 2 Integrate CTDB build into toplevel build

lib/util has diverged

Amitay Isaacs CTDB Stories

slide-14
SLIDE 14

Story of the Merge

Step 1 Convert CTDB autoconf build to waf build

Finished implementation before reaching Australia

Step 2 Integrate CTDB build into toplevel build

lib/util has diverged Can’t get rid of ctdb/lib/util

Amitay Isaacs CTDB Stories

slide-15
SLIDE 15

Story of the Merge

Step 1 Convert CTDB autoconf build to waf build

Finished implementation before reaching Australia

Step 2 Integrate CTDB build into toplevel build

lib/util has diverged Can’t get rid of ctdb/lib/util Start hacking lib/util

Amitay Isaacs CTDB Stories

slide-16
SLIDE 16

Story of the Merge

Step 1 Convert CTDB autoconf build to waf build

Finished implementation before reaching Australia

Step 2 Integrate CTDB build into toplevel build

lib/util has diverged Can’t get rid of ctdb/lib/util Start hacking lib/util Gave up! Too long for a plane trip.

Amitay Isaacs CTDB Stories

slide-17
SLIDE 17

Story of the Merge

Step 1 Convert CTDB autoconf build to waf build

Finished implementation before reaching Australia

Step 2 Integrate CTDB build into toplevel build

lib/util has diverged Can’t get rid of ctdb/lib/util Start hacking lib/util Gave up! Too long for a plane trip.

June 2014 CTDB standalone waf build commited.

Amitay Isaacs CTDB Stories

slide-18
SLIDE 18

Story of the Merge

Amitay Isaacs CTDB Stories

slide-19
SLIDE 19

Story of the Merge

Martin takes over

Amitay Isaacs CTDB Stories

slide-20
SLIDE 20

Story of the Merge

Martin takes over Remove dependency on includes.h

Amitay Isaacs CTDB Stories

slide-21
SLIDE 21

Story of the Merge

Martin takes over Remove dependency on includes.h Untangle functions & dependencies . . .

idtree.c depends on lib/crypto util.c depends on charset

Amitay Isaacs CTDB Stories

slide-22
SLIDE 22

Story of the Merge

Martin takes over Remove dependency on includes.h Untangle functions & dependencies . . .

idtree.c depends on lib/crypto util.c depends on charset

Factor out samba-util-core from samba-util to avoid pulling in non-library code.

Amitay Isaacs CTDB Stories

slide-23
SLIDE 23

Story of the Merge

Martin takes over Remove dependency on includes.h Untangle functions & dependencies . . .

idtree.c depends on lib/crypto util.c depends on charset

Factor out samba-util-core from samba-util to avoid pulling in non-library code. Clean up ctdb/lib/util

Amitay Isaacs CTDB Stories

slide-24
SLIDE 24

Story of the Merge

Martin takes over Remove dependency on includes.h Untangle functions & dependencies . . .

idtree.c depends on lib/crypto util.c depends on charset

Factor out samba-util-core from samba-util to avoid pulling in non-library code. Clean up ctdb/lib/util Clean up CTDB logging

Amitay Isaacs CTDB Stories

slide-25
SLIDE 25

Story of the Merge

Martin takes over Remove dependency on includes.h Untangle functions & dependencies . . .

idtree.c depends on lib/crypto util.c depends on charset

Factor out samba-util-core from samba-util to avoid pulling in non-library code. Clean up ctdb/lib/util Clean up CTDB logging Create new subsystem ctdb-util

Amitay Isaacs CTDB Stories

slide-26
SLIDE 26

Story of the Merge

Martin takes over Remove dependency on includes.h Untangle functions & dependencies . . .

idtree.c depends on lib/crypto util.c depends on charset

Factor out samba-util-core from samba-util to avoid pulling in non-library code. Clean up ctdb/lib/util Clean up CTDB logging Create new subsystem ctdb-util Drop CTDB log ringbuffer, adopt lib/util/debug.[ch]

Amitay Isaacs CTDB Stories

slide-27
SLIDE 27

Story of the Merge

Martin takes over Remove dependency on includes.h Untangle functions & dependencies . . .

idtree.c depends on lib/crypto util.c depends on charset

Factor out samba-util-core from samba-util to avoid pulling in non-library code. Clean up ctdb/lib/util Clean up CTDB logging Create new subsystem ctdb-util Drop CTDB log ringbuffer, adopt lib/util/debug.[ch] Replace dependency on ctdb-util with samba-util

Amitay Isaacs CTDB Stories

slide-28
SLIDE 28

Story of the Merge

Martin takes over Remove dependency on includes.h Untangle functions & dependencies . . .

idtree.c depends on lib/crypto util.c depends on charset

Factor out samba-util-core from samba-util to avoid pulling in non-library code. Clean up ctdb/lib/util Clean up CTDB logging Create new subsystem ctdb-util Drop CTDB log ringbuffer, adopt lib/util/debug.[ch] Replace dependency on ctdb-util with samba-util Hook CTDB into top level using --with-cluster-support

Amitay Isaacs CTDB Stories

slide-29
SLIDE 29

Story of the Merge

November 2014 CTDB build integrated into toplevel build.

Amitay Isaacs CTDB Stories

slide-30
SLIDE 30

Story of the Merge

November 2014 CTDB build integrated into toplevel build.

Amitay Isaacs CTDB Stories

slide-31
SLIDE 31

CTDB Releases

2.5.4 (September 2014) - 156 patches

Support for TDB robust mutexes Add ctdb detach Avoid running ctdb helpers at real-time priority Improved vacuuming performance

2.5.5 (April 2015) - 119 patches

Fix handling of IPv6 addresses Fix regression in socket handling code Make statd-callout scalable

Amitay Isaacs CTDB Stories

slide-32
SLIDE 32

Developers

Contributions in 2014 196 Martin Schwenke 184 Amitay Isaacs 55 Michael Adam 10 Volker Lendecke 3 Srikrishan Malik 3 Andrew Bartlett 2 Stefan Metzmacher 2 Gregor Beck 2 Bjorn Baumbach 1 Matthias Dieter Wallnofer 1 Jeremy Allison 1 Ira Cooper 1 David Disseldorp

Amitay Isaacs CTDB Stories

slide-33
SLIDE 33

Developers

Contributions since Jan 2015 118 Martin Schwenke 15 Amitay Isaacs 12 Volker Lendecke 3 Rajesh Joseph 1 Michael Adam 1 Led 1 Jelmer Vernooij 1 David Disseldorp 1 Christof Schmitt

Amitay Isaacs CTDB Stories

slide-34
SLIDE 34

High hopcount bug

Amitay Isaacs CTDB Stories

slide-35
SLIDE 35

High hopcount bug

Problem Logs filled with entries like:

ctdbd: High hopcount 2823099 dbid:0x7a19d84d key:0x6f9f65c4

Amitay Isaacs CTDB Stories

slide-36
SLIDE 36

High hopcount bug

Problem Logs filled with entries like:

ctdbd: High hopcount 2823099 dbid:0x7a19d84d key:0x6f9f65c4

static void ctdb_call_send_redirect(ctdb, ctdb_db, key, c, header) { uint32_t lmaster = ctdb_lmaster(ctdb, &key); c->hdr.destnode = lmaster; if (ctdb->pnn == lmaster) { c->hdr.destnode = header->dmaster; } c->hopcount++; if (c->hopcount%100 > 95) { DEBUG(DEBUG_WARNING,("High hopcount ...")); } ctdb_queue_packet(ctdb, &c->hdr); }

Amitay Isaacs CTDB Stories

slide-37
SLIDE 37

High hopcount bug

Record Migration Record: Node 1 is LMASTER, Node 2 is DMASTER

Amitay Isaacs CTDB Stories

slide-38
SLIDE 38

High hopcount bug

Record Migration Record: Node 1 is LMASTER, Node 2 is DMASTER Request for record received on Node 0 (REQ CALL)

Amitay Isaacs CTDB Stories

slide-39
SLIDE 39

High hopcount bug

Record Migration Record: Node 1 is LMASTER, Node 2 is DMASTER Request for record received on Node 0 (REQ CALL) Request redirected to Node 1 (REQ CALL)

Amitay Isaacs CTDB Stories

slide-40
SLIDE 40

High hopcount bug

Record Migration Record: Node 1 is LMASTER, Node 2 is DMASTER Request for record received on Node 0 (REQ CALL) Request redirected to Node 1 (REQ CALL) Request redirected to Node 2 (REQ CALL)

Amitay Isaacs CTDB Stories

slide-41
SLIDE 41

High hopcount bug

Record Migration Record: Node 1 is LMASTER, Node 2 is DMASTER Request for record received on Node 0 (REQ CALL) Request redirected to Node 1 (REQ CALL) Request redirected to Node 2 (REQ CALL) Reply to Node 1 (DMASTER REQ)

Amitay Isaacs CTDB Stories

slide-42
SLIDE 42

High hopcount bug

Record Migration Record: Node 1 is LMASTER, Node 2 is DMASTER Request for record received on Node 0 (REQ CALL) Request redirected to Node 1 (REQ CALL) Request redirected to Node 2 (REQ CALL) Reply to Node 1 (DMASTER REQ) Reply to Node 0 (DMASTER REPLY)

Amitay Isaacs CTDB Stories

slide-43
SLIDE 43

High hopcount bug

Record Migration Record: Node 1 is LMASTER, Node 2 is DMASTER Request for record received on Node 0 (REQ CALL) Request redirected to Node 1 (REQ CALL) Request redirected to Node 2 (REQ CALL) Reply to Node 1 (DMASTER REQ) Reply to Node 0 (DMASTER REPLY) Reply to Client (REPLY CALL)

Amitay Isaacs CTDB Stories

slide-44
SLIDE 44

High hopcount bug

Debugging

Amitay Isaacs CTDB Stories

slide-45
SLIDE 45

High hopcount bug

Debugging Noticed after fixes for vacuuming/recovery interaction bug

Amitay Isaacs CTDB Stories

slide-46
SLIDE 46

High hopcount bug

Debugging Noticed after fixes for vacuuming/recovery interaction bug The problem was hard to reproduce

Amitay Isaacs CTDB Stories

slide-47
SLIDE 47

High hopcount bug

Debugging Noticed after fixes for vacuuming/recovery interaction bug The problem was hard to reproduce Many times the problem resolved itself

Amitay Isaacs CTDB Stories

slide-48
SLIDE 48

High hopcount bug

Debugging Noticed after fixes for vacuuming/recovery interaction bug The problem was hard to reproduce Many times the problem resolved itself Suspects Two requests chasing each-other

Amitay Isaacs CTDB Stories

slide-49
SLIDE 49

High hopcount bug

Debugging Noticed after fixes for vacuuming/recovery interaction bug The problem was hard to reproduce Many times the problem resolved itself Suspects Two requests chasing each-other Record header corruption

Amitay Isaacs CTDB Stories

slide-50
SLIDE 50

High hopcount bug

Debugging Noticed after fixes for vacuuming/recovery interaction bug The problem was hard to reproduce Many times the problem resolved itself Suspects Two requests chasing each-other Record header corruption Fixes for vaccuming/recovery interaction bug

Amitay Isaacs CTDB Stories

slide-51
SLIDE 51

High hopcount bug

Debugging Noticed after fixes for vacuuming/recovery interaction bug The problem was hard to reproduce Many times the problem resolved itself Suspects Two requests chasing each-other Record header corruption Fixes for vaccuming/recovery interaction bug

Did identify few issues in the fixes

Amitay Isaacs CTDB Stories

slide-52
SLIDE 52

High hopcount bug

Debugging Noticed after fixes for vacuuming/recovery interaction bug The problem was hard to reproduce Many times the problem resolved itself Suspects Two requests chasing each-other Record header corruption Fixes for vaccuming/recovery interaction bug

Did identify few issues in the fixes However, the problem did not go away

Amitay Isaacs CTDB Stories

slide-53
SLIDE 53

High hopcount bug

Debugging Noticed after fixes for vacuuming/recovery interaction bug The problem was hard to reproduce Many times the problem resolved itself Suspects Two requests chasing each-other Record header corruption Fixes for vaccuming/recovery interaction bug

Did identify few issues in the fixes However, the problem did not go away

Locking code was being modified

Amitay Isaacs CTDB Stories

slide-54
SLIDE 54

High hopcount bug

Amitay Isaacs CTDB Stories

slide-55
SLIDE 55

High hopcount bug

Instrument record request processing code

Amitay Isaacs CTDB Stories

slide-56
SLIDE 56

High hopcount bug

Instrument record request processing code Node 1 is the DMASTER for a record (hash 0x0aa13d47)

Amitay Isaacs CTDB Stories

slide-57
SLIDE 57

High hopcount bug

Instrument record request processing code Node 1 is the DMASTER for a record (hash 0x0aa13d47) Record is getting updated regularly on Node 1

UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9620] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9621] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9622] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9623] dmaster[1]

Amitay Isaacs CTDB Stories

slide-58
SLIDE 58

High hopcount bug

Instrument record request processing code Node 1 is the DMASTER for a record (hash 0x0aa13d47) Record is getting updated regularly on Node 1

UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9620] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9621] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9622] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9623] dmaster[1]

Node 0 requests the record. Node 1 updates DMASTER.

UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9640] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9641] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9641] dmaster[0]

Amitay Isaacs CTDB Stories

slide-59
SLIDE 59

High hopcount bug

Instrument record request processing code Node 1 is the DMASTER for a record (hash 0x0aa13d47) Record is getting updated regularly on Node 1

UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9620] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9621] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9622] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9623] dmaster[1]

Node 0 requests the record. Node 1 updates DMASTER.

UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9640] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9641] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9641] dmaster[0]

And Node 1 migrates the record to Node 0

Amitay Isaacs CTDB Stories

slide-60
SLIDE 60

High hopcount bug

Instrument record request processing code Node 1 is the DMASTER for a record (hash 0x0aa13d47) Record is getting updated regularly on Node 1

UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9620] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9621] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9622] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9623] dmaster[1]

Node 0 requests the record. Node 1 updates DMASTER.

UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9640] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9641] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9641] dmaster[0]

And Node 1 migrates the record to Node 0 On Node 0 CTDB tries to grab the record lock

Amitay Isaacs CTDB Stories

slide-61
SLIDE 61

High hopcount bug

Instrument record request processing code Node 1 is the DMASTER for a record (hash 0x0aa13d47) Record is getting updated regularly on Node 1

UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9620] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9621] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9622] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9623] dmaster[1]

Node 0 requests the record. Node 1 updates DMASTER.

UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9640] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9641] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9641] dmaster[0]

And Node 1 migrates the record to Node 0 On Node 0 CTDB tries to grab the record lock

Cannot get a lock in non-blocking mode

Amitay Isaacs CTDB Stories

slide-62
SLIDE 62

High hopcount bug

Instrument record request processing code Node 1 is the DMASTER for a record (hash 0x0aa13d47) Record is getting updated regularly on Node 1

UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9620] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9621] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9622] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9623] dmaster[1]

Node 0 requests the record. Node 1 updates DMASTER.

UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9640] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9641] dmaster[1] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9641] dmaster[0]

And Node 1 migrates the record to Node 0 On Node 0 CTDB tries to grab the record lock

Cannot get a lock in non-blocking mode Creates a lock request

Amitay Isaacs CTDB Stories

slide-63
SLIDE 63

High hopcount bug

Meanwhile, more record requests queue up

Amitay Isaacs CTDB Stories

slide-64
SLIDE 64

High hopcount bug

Meanwhile, more record requests queue up

Waiting reqid:732 key:0x0aa13d47 Waiting reqid:684 key:0x0aa13d47 Waiting reqid:715 key:0x0aa13d47 Waiting reqid:701 key:0x0aa13d47

Amitay Isaacs CTDB Stories

slide-65
SLIDE 65

High hopcount bug

Meanwhile, more record requests queue up

Waiting reqid:732 key:0x0aa13d47 Waiting reqid:684 key:0x0aa13d47 Waiting reqid:715 key:0x0aa13d47 Waiting reqid:701 key:0x0aa13d47

Soon after high hopcount messages are logged on Node 0

High hopcount 97 key:0x0aa13d47 reqid=00004771 pnn:0 src:1 lmaster:1 High hopcount 99 key:0x0aa13d47 reqid=00004771 pnn:0 src:1 lmaster:1 High hopcount 196 key:0x0aa13d47 reqid=000039f9 pnn:0 src:0 lmaster:1 High hopcount 198 key:0x0aa13d47 reqid=000039f9 pnn:0 src:0 lmaster:1

Amitay Isaacs CTDB Stories

slide-66
SLIDE 66

High hopcount bug

Meanwhile, more record requests queue up

Waiting reqid:732 key:0x0aa13d47 Waiting reqid:684 key:0x0aa13d47 Waiting reqid:715 key:0x0aa13d47 Waiting reqid:701 key:0x0aa13d47

Soon after high hopcount messages are logged on Node 0

High hopcount 97 key:0x0aa13d47 reqid=00004771 pnn:0 src:1 lmaster:1 High hopcount 99 key:0x0aa13d47 reqid=00004771 pnn:0 src:1 lmaster:1 High hopcount 196 key:0x0aa13d47 reqid=000039f9 pnn:0 src:0 lmaster:1 High hopcount 198 key:0x0aa13d47 reqid=000039f9 pnn:0 src:0 lmaster:1

These record requests bounce very quickly. After 2 seconds:

High hopcount 955596 key:0x0aa13d47 reqid=000039f9 pnn:0 src:0 lmaster:1 High hopcount 955598 key:0x0aa13d47 reqid=000039f9 pnn:0 src:0 lmaster:1 High hopcount 955597 key:0x0aa13d47 reqid=00004771 pnn:0 src:1 lmaster:1 High hopcount 955599 key:0x0aa13d47 reqid=00004771 pnn:0 src:1 lmaster:1

Amitay Isaacs CTDB Stories

slide-67
SLIDE 67

High hopcount bug

Sometime later the migrated record request gets processed

Amitay Isaacs CTDB Stories

slide-68
SLIDE 68

High hopcount bug

Sometime later the migrated record request gets processed

UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9642] dmaster[0] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9643] dmaster[0]

Amitay Isaacs CTDB Stories

slide-69
SLIDE 69

High hopcount bug

Sometime later the migrated record request gets processed

UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9642] dmaster[0] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9643] dmaster[0]

And the bouncing requests stop.

Amitay Isaacs CTDB Stories

slide-70
SLIDE 70

High hopcount bug

Sometime later the migrated record request gets processed

UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9642] dmaster[0] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9643] dmaster[0]

And the bouncing requests stop. Temporary inconsistency during record migration

Amitay Isaacs CTDB Stories

slide-71
SLIDE 71

High hopcount bug

Sometime later the migrated record request gets processed

UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9642] dmaster[0] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9643] dmaster[0]

And the bouncing requests stop. Temporary inconsistency during record migration

Node 0 says Node 1 is DMASTER Node 1 says Node 0 is DMASTER

Amitay Isaacs CTDB Stories

slide-72
SLIDE 72

High hopcount bug

Sometime later the migrated record request gets processed

UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9642] dmaster[0] UPDATE db[notify_index.tdb]: store: hash[0x0aa13d47] rsn[9643] dmaster[0]

And the bouncing requests stop. Temporary inconsistency during record migration

Node 0 says Node 1 is DMASTER Node 1 says Node 0 is DMASTER

Solution Avoid processing record requests for record in migration

Amitay Isaacs CTDB Stories

slide-73
SLIDE 73

Getting Lock Scheduling Right

Amitay Isaacs CTDB Stories

slide-74
SLIDE 74

Getting Lock Scheduling Right

Locks in CTDB Record locks

To modify a record, CTDB tries to grab non-blocking lock If that fails, create a lock request

Amitay Isaacs CTDB Stories

slide-75
SLIDE 75

Getting Lock Scheduling Right

Locks in CTDB Record locks

To modify a record, CTDB tries to grab non-blocking lock If that fails, create a lock request

Database locks

For database recovery, CTDB needs to freeze all databases

Amitay Isaacs CTDB Stories

slide-76
SLIDE 76

Getting Lock Scheduling Right

Locks in CTDB Record locks

To modify a record, CTDB tries to grab non-blocking lock If that fails, create a lock request

Database locks

For database recovery, CTDB needs to freeze all databases

Why lock scheduling

Amitay Isaacs CTDB Stories

slide-77
SLIDE 77

Getting Lock Scheduling Right

Locks in CTDB Record locks

To modify a record, CTDB tries to grab non-blocking lock If that fails, create a lock request

Database locks

For database recovery, CTDB needs to freeze all databases

Why lock scheduling Multiple requests for different records

Amitay Isaacs CTDB Stories

slide-78
SLIDE 78

Getting Lock Scheduling Right

Locks in CTDB Record locks

To modify a record, CTDB tries to grab non-blocking lock If that fails, create a lock request

Database locks

For database recovery, CTDB needs to freeze all databases

Why lock scheduling Multiple requests for different records Multiple requests for same record

Amitay Isaacs CTDB Stories

slide-79
SLIDE 79

Getting Lock Scheduling Right

Locks in CTDB Record locks

To modify a record, CTDB tries to grab non-blocking lock If that fails, create a lock request

Database locks

For database recovery, CTDB needs to freeze all databases

Why lock scheduling Multiple requests for different records Multiple requests for same record There are multiple databases

Amitay Isaacs CTDB Stories

slide-80
SLIDE 80

Getting Lock Scheduling Right

Locks in CTDB Record locks

To modify a record, CTDB tries to grab non-blocking lock If that fails, create a lock request

Database locks

For database recovery, CTDB needs to freeze all databases

Why lock scheduling Multiple requests for different records Multiple requests for same record There are multiple databases Freeze requests are handled independently

Amitay Isaacs CTDB Stories

slide-81
SLIDE 81

Getting Lock Scheduling Right

New locking API abstaction - Naive approach

Amitay Isaacs CTDB Stories

slide-82
SLIDE 82

Getting Lock Scheduling Right

New locking API abstaction - Naive approach Same API for record lock request and database lock request

Amitay Isaacs CTDB Stories

slide-83
SLIDE 83

Getting Lock Scheduling Right

New locking API abstaction - Naive approach Same API for record lock request and database lock request Queues for active and pending lock requests

Amitay Isaacs CTDB Stories

slide-84
SLIDE 84

Getting Lock Scheduling Right

New locking API abstaction - Naive approach Same API for record lock request and database lock request Queues for active and pending lock requests Maximum number of active lock requests

Amitay Isaacs CTDB Stories

slide-85
SLIDE 85

Getting Lock Scheduling Right

New locking API abstaction - Naive approach Same API for record lock request and database lock request Queues for active and pending lock requests Maximum number of active lock requests Create a child process to lock the record

Amitay Isaacs CTDB Stories

slide-86
SLIDE 86

Getting Lock Scheduling Right

New locking API abstaction - Naive approach Same API for record lock request and database lock request Queues for active and pending lock requests Maximum number of active lock requests Create a child process to lock the record Mostly works . . .

Amitay Isaacs CTDB Stories

slide-87
SLIDE 87

Getting Lock Scheduling Right

New locking API abstaction - Naive approach Same API for record lock request and database lock request Queues for active and pending lock requests Maximum number of active lock requests Create a child process to lock the record Mostly works . . . Problem . . . till database recovery is triggered under load

Amitay Isaacs CTDB Stories

slide-88
SLIDE 88

Getting Lock Scheduling Right

New locking API abstaction - Naive approach Same API for record lock request and database lock request Queues for active and pending lock requests Maximum number of active lock requests Create a child process to lock the record Mostly works . . . Problem . . . till database recovery is triggered under load Solution Active queue is full and freeze lock requests are pending Freeze lock requests need to be scheduled immediately

Amitay Isaacs CTDB Stories

slide-89
SLIDE 89

Getting Lock Scheduling Right

Problem Performance is not good when record locking is in use

Amitay Isaacs CTDB Stories

slide-90
SLIDE 90

Getting Lock Scheduling Right

Problem Performance is not good when record locking is in use Solution A single limit on active records kills performance for locking requests across multiple databases

Amitay Isaacs CTDB Stories

slide-91
SLIDE 91

Getting Lock Scheduling Right

Problem Performance is not good when record locking is in use Solution A single limit on active records kills performance for locking requests across multiple databases Implement per database limits for active lock requests

Amitay Isaacs CTDB Stories

slide-92
SLIDE 92

Getting Lock Scheduling Right

Problem Performance is not good when record locking is in use Solution A single limit on active records kills performance for locking requests across multiple databases Implement per database limits for active lock requests Problem There are multiple lock processes waiting for the same record

Amitay Isaacs CTDB Stories

slide-93
SLIDE 93

Getting Lock Scheduling Right

Problem Performance is not good when record locking is in use Solution A single limit on active records kills performance for locking requests across multiple databases Implement per database limits for active lock requests Problem There are multiple lock processes waiting for the same record Solution Rely on kernel to do “fair scheduling”

Amitay Isaacs CTDB Stories

slide-94
SLIDE 94

Getting Lock Scheduling Right

Problem Performance is not good when record locking is in use Solution A single limit on active records kills performance for locking requests across multiple databases Implement per database limits for active lock requests Problem There are multiple lock processes waiting for the same record Solution Rely on kernel to do “fair scheduling” Before scheduling a lock request, check if there is an active lock request for the same record

Amitay Isaacs CTDB Stories

slide-95
SLIDE 95

Getting Lock Scheduling Right

Problem CTDB is consuming 100% CPU under heavy load

Amitay Isaacs CTDB Stories

slide-96
SLIDE 96

Getting Lock Scheduling Right

Problem CTDB is consuming 100% CPU under heavy load Solution Active and pending lock queues are implemented as linked lists

Amitay Isaacs CTDB Stories

slide-97
SLIDE 97

Getting Lock Scheduling Right

Problem CTDB is consuming 100% CPU under heavy load Solution Active and pending lock queues are implemented as linked lists CTDB is spinning trying to schedule next request (60k requests in pending queue)

Amitay Isaacs CTDB Stories

slide-98
SLIDE 98

Getting Lock Scheduling Right

Problem CTDB is consuming 100% CPU under heavy load Solution Active and pending lock queues are implemented as linked lists CTDB is spinning trying to schedule next request (60k requests in pending queue) Undo active lock checking?

Amitay Isaacs CTDB Stories

slide-99
SLIDE 99

Getting Lock Scheduling Right

Problem CTDB is consuming 100% CPU under heavy load Solution Active and pending lock queues are implemented as linked lists CTDB is spinning trying to schedule next request (60k requests in pending queue) Undo active lock checking? Implement per database queues, not sufficient!

Amitay Isaacs CTDB Stories

slide-100
SLIDE 100

Getting Lock Scheduling Right

Problem CTDB is consuming 100% CPU under heavy load Solution Active and pending lock queues are implemented as linked lists CTDB is spinning trying to schedule next request (60k requests in pending queue) Undo active lock checking? Implement per database queues, not sufficient! Better Solution Use better data structure for checking active lock requests

Amitay Isaacs CTDB Stories

slide-101
SLIDE 101

All nodes banned on single node failure

Amitay Isaacs CTDB Stories

slide-102
SLIDE 102

All nodes banned on single node failure

Observation A node becomes INACTIVE (disconnected, stopped or banned)

Amitay Isaacs CTDB Stories

slide-103
SLIDE 103

All nodes banned on single node failure

Observation A node becomes INACTIVE (disconnected, stopped or banned) CTDB tries to freeze databases for recovery and fails

Amitay Isaacs CTDB Stories

slide-104
SLIDE 104

All nodes banned on single node failure

Observation A node becomes INACTIVE (disconnected, stopped or banned) CTDB tries to freeze databases for recovery and fails CTDB retries and bans culprit node

Amitay Isaacs CTDB Stories

slide-105
SLIDE 105

All nodes banned on single node failure

Observation A node becomes INACTIVE (disconnected, stopped or banned) CTDB tries to freeze databases for recovery and fails CTDB retries and bans culprit node Eventually ends up banning all remaining nodes

Amitay Isaacs CTDB Stories

slide-106
SLIDE 106

All nodes banned on single node failure

Observation A node becomes INACTIVE (disconnected, stopped or banned) CTDB tries to freeze databases for recovery and fails CTDB retries and bans culprit node Eventually ends up banning all remaining nodes If locking database fails, CTDB logs useful information

Amitay Isaacs CTDB Stories

slide-107
SLIDE 107

All nodes banned on single node failure

Observation A node becomes INACTIVE (disconnected, stopped or banned) CTDB tries to freeze databases for recovery and fails CTDB retries and bans culprit node Eventually ends up banning all remaining nodes If locking database fails, CTDB logs useful information

All processes holding locks on CTDB database Stack traces for all those processes

Amitay Isaacs CTDB Stories

slide-108
SLIDE 108

All nodes banned on single node failure

Observation A node becomes INACTIVE (disconnected, stopped or banned) CTDB tries to freeze databases for recovery and fails CTDB retries and bans culprit node Eventually ends up banning all remaining nodes If locking database fails, CTDB logs useful information

All processes holding locks on CTDB database Stack traces for all those processes Relies on parsing /proc/locks

Amitay Isaacs CTDB Stories

slide-109
SLIDE 109

All nodes banned on single node failure

Observation A node becomes INACTIVE (disconnected, stopped or banned) CTDB tries to freeze databases for recovery and fails CTDB retries and bans culprit node Eventually ends up banning all remaining nodes If locking database fails, CTDB logs useful information

All processes holding locks on CTDB database Stack traces for all those processes Relies on parsing /proc/locks

Cannot be used with TDB robust mutexes

Amitay Isaacs CTDB Stories

slide-110
SLIDE 110

All nodes banned on single node failure

Observation A node becomes INACTIVE (disconnected, stopped or banned) CTDB tries to freeze databases for recovery and fails CTDB retries and bans culprit node Eventually ends up banning all remaining nodes If locking database fails, CTDB logs useful information

All processes holding locks on CTDB database Stack traces for all those processes Relies on parsing /proc/locks

Cannot be used with TDB robust mutexes Recreate after disabling TDB robust mutexes

Amitay Isaacs CTDB Stories

slide-111
SLIDE 111

All nodes banned on single node failure

CTDB fails to freeze smbXsrv session global.tdb

Amitay Isaacs CTDB Stories

slide-112
SLIDE 112

All nodes banned on single node failure

CTDB fails to freeze smbXsrv session global.tdb

ctdbd-lock: /usr/bin/ctdb_lock_helper smbXsrv_session_global.tdb.0 168 223318 ctdbd-lock: /usr/bin/ctdb_lock_helper smbXsrv_tcon_global.tdb.0 168 EOF ctdbd-lock: /usr/sbin/smbd smbXsrv_tcon_global.tdb.0 251880 251880 W ctdbd-lock: /usr/bin/ctdb_lock_helper locking.tdb.0 168 EOF ctdbd-lock: /usr/bin/ctdb_lock_helper smbXsrv_open_global.tdb.0 168 EOF ctdbd-lock: /usr/bin/ctdb_lock_helper cnscm_monitoring.tdb.0 168 EOF ctdbd-lock: /usr/sbin/smbd smbXsrv_session_global.tdb.0 223320 223320

Amitay Isaacs CTDB Stories

slide-113
SLIDE 113

All nodes banned on single node failure

CTDB fails to freeze smbXsrv session global.tdb

ctdbd-lock: /usr/bin/ctdb_lock_helper smbXsrv_session_global.tdb.0 168 223318 ctdbd-lock: /usr/bin/ctdb_lock_helper smbXsrv_tcon_global.tdb.0 168 EOF ctdbd-lock: /usr/sbin/smbd smbXsrv_tcon_global.tdb.0 251880 251880 W ctdbd-lock: /usr/bin/ctdb_lock_helper locking.tdb.0 168 EOF ctdbd-lock: /usr/bin/ctdb_lock_helper smbXsrv_open_global.tdb.0 168 EOF ctdbd-lock: /usr/bin/ctdb_lock_helper cnscm_monitoring.tdb.0 168 EOF ctdbd-lock: /usr/sbin/smbd smbXsrv_session_global.tdb.0 223320 223320

Samba process is holding a lock

Amitay Isaacs CTDB Stories

slide-114
SLIDE 114

All nodes banned on single node failure

Stack trace for relevant samba process

#0 0x00007fde05236218 in poll () from /lib64/libc.so.6 #1 0x00007fde0863a93c in poll_one_fd () #2 0x00007fde0861146b in ctdb_packet_fd_read_sync_timeout () #3 0x00007fde08611c0d in ctdb_packet_fd_read_sync () #4 0x00007fde086126fa in ctdb_read_req () #5 0x00007fde08612eae in ctdbd_parse () #6 0x00007fde0862184d in db_ctdb_parse_record () #7 0x00007fde0861d9d4 in dbwrap_parse_record () #8 0x00007fde0861dc2a in dbwrap_fetch () #9 0x00007fde086250fd in dbwrap_watch_record_stored () #10 0x00007fde0861dc86 in dbwrap_record_delete () #11 0x00007fde083887bd in smbXsrv_session_logoff () #12 0x00007fde083892aa in smbXsrv_session_logoff_all_callback () #13 0x00007fde08626389 in db_rbt_traverse_internal () #14 0x00007fde086264da in db_rbt_traverse () #15 0x00007fde0861d96a in dbwrap_traverse () #16 0x00007fde08389918 in smbXsrv_session_logoff_all () #17 0x00007fde088e41a0 in exit_server_common () #18 0x00007fde088e462e in smbd_exit_server_cleanly () #19 0x00007fde083609e2 in exit_server_cleanly ()

Amitay Isaacs CTDB Stories

slide-115
SLIDE 115

All nodes banned on single node failure

Amitay Isaacs CTDB Stories

slide-116
SLIDE 116

All nodes banned on single node failure

Samba is holding a record lock (smbXsrv_session_global.tdb)

Amitay Isaacs CTDB Stories

slide-117
SLIDE 117

All nodes banned on single node failure

Samba is holding a record lock (smbXsrv_session_global.tdb) And waiting for another record (dbwatchers.tdb)

Amitay Isaacs CTDB Stories

slide-118
SLIDE 118

All nodes banned on single node failure

Samba is holding a record lock (smbXsrv_session_global.tdb) And waiting for another record (dbwatchers.tdb) CTDB is in the process of migrating the record

Amitay Isaacs CTDB Stories

slide-119
SLIDE 119

All nodes banned on single node failure

Samba is holding a record lock (smbXsrv_session_global.tdb) And waiting for another record (dbwatchers.tdb) CTDB is in the process of migrating the record At this time CTDB on the remote node becomes INACTIVE

Amitay Isaacs CTDB Stories

slide-120
SLIDE 120

All nodes banned on single node failure

Samba is holding a record lock (smbXsrv_session_global.tdb) And waiting for another record (dbwatchers.tdb) CTDB is in the process of migrating the record At this time CTDB on the remote node becomes INACTIVE CTDB has to perform database recovery

Amitay Isaacs CTDB Stories

slide-121
SLIDE 121

All nodes banned on single node failure

Samba is holding a record lock (smbXsrv_session_global.tdb) And waiting for another record (dbwatchers.tdb) CTDB is in the process of migrating the record At this time CTDB on the remote node becomes INACTIVE CTDB has to perform database recovery CTDB starts to freeze databases

Amitay Isaacs CTDB Stories

slide-122
SLIDE 122

All nodes banned on single node failure

Samba is holding a record lock (smbXsrv_session_global.tdb) And waiting for another record (dbwatchers.tdb) CTDB is in the process of migrating the record At this time CTDB on the remote node becomes INACTIVE CTDB has to perform database recovery CTDB starts to freeze databases CTDB cannot lock smbXsrv_session_global.tdb

Amitay Isaacs CTDB Stories

slide-123
SLIDE 123

All nodes banned on single node failure

Samba is holding a record lock (smbXsrv_session_global.tdb) And waiting for another record (dbwatchers.tdb) CTDB is in the process of migrating the record At this time CTDB on the remote node becomes INACTIVE CTDB has to perform database recovery CTDB starts to freeze databases CTDB cannot lock smbXsrv_session_global.tdb Deadlock!

Amitay Isaacs CTDB Stories

slide-124
SLIDE 124

All nodes banned on single node failure

Samba is holding a record lock (smbXsrv_session_global.tdb) And waiting for another record (dbwatchers.tdb) CTDB is in the process of migrating the record At this time CTDB on the remote node becomes INACTIVE CTDB has to perform database recovery CTDB starts to freeze databases CTDB cannot lock smbXsrv_session_global.tdb Deadlock! Since CTDB cannot freeze databases, it will ban the culprit

Amitay Isaacs CTDB Stories

slide-125
SLIDE 125

All nodes banned on single node failure

Samba is holding a record lock (smbXsrv_session_global.tdb) And waiting for another record (dbwatchers.tdb) CTDB is in the process of migrating the record At this time CTDB on the remote node becomes INACTIVE CTDB has to perform database recovery CTDB starts to freeze databases CTDB cannot lock smbXsrv_session_global.tdb Deadlock! Since CTDB cannot freeze databases, it will ban the culprit Multiple Samba processes holding a lock on different nodes

Amitay Isaacs CTDB Stories

slide-126
SLIDE 126

All nodes banned on single node failure

Samba is holding a record lock (smbXsrv_session_global.tdb) And waiting for another record (dbwatchers.tdb) CTDB is in the process of migrating the record At this time CTDB on the remote node becomes INACTIVE CTDB has to perform database recovery CTDB starts to freeze databases CTDB cannot lock smbXsrv_session_global.tdb Deadlock! Since CTDB cannot freeze databases, it will ban the culprit Multiple Samba processes holding a lock on different nodes All nodes get banned!

Amitay Isaacs CTDB Stories

slide-127
SLIDE 127

All nodes banned on single node failure

Problem CTDB cannot freeze database since Samba is holding a lock Samba will not release a lock, till it gets the second lock

Amitay Isaacs CTDB Stories

slide-128
SLIDE 128

All nodes banned on single node failure

Problem CTDB cannot freeze database since Samba is holding a lock Samba will not release a lock, till it gets the second lock CTDB database recovery is serial

Amitay Isaacs CTDB Stories

slide-129
SLIDE 129

All nodes banned on single node failure

Problem CTDB cannot freeze database since Samba is holding a lock Samba will not release a lock, till it gets the second lock CTDB database recovery is serial

Freeze all databases Recover databases one by one Unlock all databases

Amitay Isaacs CTDB Stories

slide-130
SLIDE 130

All nodes banned on single node failure

Problem CTDB cannot freeze database since Samba is holding a lock Samba will not release a lock, till it gets the second lock CTDB database recovery is serial

Freeze all databases Recover databases one by one Unlock all databases

Solution Do database recovery in parallel

Amitay Isaacs CTDB Stories

slide-131
SLIDE 131

All nodes banned on single node failure

Problem CTDB cannot freeze database since Samba is holding a lock Samba will not release a lock, till it gets the second lock CTDB database recovery is serial

Freeze all databases Recover databases one by one Unlock all databases

Solution Do database recovery in parallel

Start freeze of all databases

Amitay Isaacs CTDB Stories

slide-132
SLIDE 132

All nodes banned on single node failure

Problem CTDB cannot freeze database since Samba is holding a lock Samba will not release a lock, till it gets the second lock CTDB database recovery is serial

Freeze all databases Recover databases one by one Unlock all databases

Solution Do database recovery in parallel

Start freeze of all databases As soon as database is frozen, recover database

Amitay Isaacs CTDB Stories

slide-133
SLIDE 133

All nodes banned on single node failure

Problem CTDB cannot freeze database since Samba is holding a lock Samba will not release a lock, till it gets the second lock CTDB database recovery is serial

Freeze all databases Recover databases one by one Unlock all databases

Solution Do database recovery in parallel

Start freeze of all databases As soon as database is frozen, recover database Process all pending call requests for that database

Amitay Isaacs CTDB Stories

slide-134
SLIDE 134

Real time or not

Background

Amitay Isaacs CTDB Stories

slide-135
SLIDE 135

Real time or not

Background CTDB runs with real-time priority

Amitay Isaacs CTDB Stories

slide-136
SLIDE 136

Real time or not

Background CTDB runs with real-time priority CTDB creates lots of processes.

Amitay Isaacs CTDB Stories

slide-137
SLIDE 137

Real time or not

Background CTDB runs with real-time priority CTDB creates lots of processes. ctdb_fork() - reset process priority

Amitay Isaacs CTDB Stories

slide-138
SLIDE 138

Real time or not

Background CTDB runs with real-time priority CTDB creates lots of processes. ctdb_fork() - reset process priority fork() is found to be expensive on busy systems

Amitay Isaacs CTDB Stories

slide-139
SLIDE 139

Real time or not

Background CTDB runs with real-time priority CTDB creates lots of processes. ctdb_fork() - reset process priority fork() is found to be expensive on busy systems Replace fork() with vfork() and exec*()

Amitay Isaacs CTDB Stories

slide-140
SLIDE 140

Real time or not

Background CTDB runs with real-time priority CTDB creates lots of processes. ctdb_fork() - reset process priority fork() is found to be expensive on busy systems Replace fork() with vfork() and exec*() Introduce helper processes - ctdb_event_helper

Amitay Isaacs CTDB Stories

slide-141
SLIDE 141

Real time or not

Background CTDB runs with real-time priority CTDB creates lots of processes. ctdb_fork() - reset process priority fork() is found to be expensive on busy systems Replace fork() with vfork() and exec*() Introduce helper processes - ctdb_event_helper Regression

Amitay Isaacs CTDB Stories

slide-142
SLIDE 142

Real time or not

Background CTDB runs with real-time priority CTDB creates lots of processes. ctdb_fork() - reset process priority fork() is found to be expensive on busy systems Replace fork() with vfork() and exec*() Introduce helper processes - ctdb_event_helper Regression All event scripts now run with real-time priority

Amitay Isaacs CTDB Stories

slide-143
SLIDE 143

Real time or not

Background CTDB runs with real-time priority CTDB creates lots of processes. ctdb_fork() - reset process priority fork() is found to be expensive on busy systems Replace fork() with vfork() and exec*() Introduce helper processes - ctdb_event_helper Regression All event scripts now run with real-time priority CTDB_MANAGES_SAMBA=yes

Amitay Isaacs CTDB Stories

slide-144
SLIDE 144

Real time or not

Background CTDB runs with real-time priority CTDB creates lots of processes. ctdb_fork() - reset process priority fork() is found to be expensive on busy systems Replace fork() with vfork() and exec*() Introduce helper processes - ctdb_event_helper Regression All event scripts now run with real-time priority CTDB_MANAGES_SAMBA=yes In 50.samba, startup event starts smbd

Amitay Isaacs CTDB Stories

slide-145
SLIDE 145

Fixing compiler warnings

Background

Amitay Isaacs CTDB Stories

slide-146
SLIDE 146

Fixing compiler warnings

Background CTDB sets up pipe from a child process

Amitay Isaacs CTDB Stories

slide-147
SLIDE 147

Fixing compiler warnings

Background CTDB sets up pipe from a child process

So child process can send the status via pipe

Amitay Isaacs CTDB Stories

slide-148
SLIDE 148

Fixing compiler warnings

Background CTDB sets up pipe from a child process

So child process can send the status via pipe Pipe close indicates failure of child

Amitay Isaacs CTDB Stories

slide-149
SLIDE 149

Fixing compiler warnings

Background CTDB sets up pipe from a child process

So child process can send the status via pipe Pipe close indicates failure of child

Many read()/write() calls without checking return values

Amitay Isaacs CTDB Stories

slide-150
SLIDE 150

Fixing compiler warnings

Background CTDB sets up pipe from a child process

So child process can send the status via pipe Pipe close indicates failure of child

Many read()/write() calls without checking return values Replace all read()/write() with sys_read()/sys_write()

Amitay Isaacs CTDB Stories

slide-151
SLIDE 151

Fixing compiler warnings

Background CTDB sets up pipe from a child process

So child process can send the status via pipe Pipe close indicates failure of child

Many read()/write() calls without checking return values Replace all read()/write() with sys_read()/sys_write() Regression

Amitay Isaacs CTDB Stories

slide-152
SLIDE 152

Fixing compiler warnings

Background CTDB sets up pipe from a child process

So child process can send the status via pipe Pipe close indicates failure of child

Many read()/write() calls without checking return values Replace all read()/write() with sys_read()/sys_write() Regression While testing on VMs, CTDB consuming 100% CPU

Amitay Isaacs CTDB Stories

slide-153
SLIDE 153

Fixing compiler warnings

Background CTDB sets up pipe from a child process

So child process can send the status via pipe Pipe close indicates failure of child

Many read()/write() calls without checking return values Replace all read()/write() with sys_read()/sys_write() Regression While testing on VMs, CTDB consuming 100% CPU Tracing shows CTDB is busy stuck in sys_write()

Amitay Isaacs CTDB Stories

slide-154
SLIDE 154

Fixing compiler warnings

Background CTDB sets up pipe from a child process

So child process can send the status via pipe Pipe close indicates failure of child

Many read()/write() calls without checking return values Replace all read()/write() with sys_read()/sys_write() Regression While testing on VMs, CTDB consuming 100% CPU Tracing shows CTDB is busy stuck in sys_write() Samba not getting scheduled to read from CTDB

Amitay Isaacs CTDB Stories

slide-155
SLIDE 155

Fixing compiler warnings

Background CTDB sets up pipe from a child process

So child process can send the status via pipe Pipe close indicates failure of child

Many read()/write() calls without checking return values Replace all read()/write() with sys_read()/sys_write() Regression While testing on VMs, CTDB consuming 100% CPU Tracing shows CTDB is busy stuck in sys_write() Samba not getting scheduled to read from CTDB If write() calls fails with EAGAIN, back off

Amitay Isaacs CTDB Stories

slide-156
SLIDE 156

Questions/Comments?

Amitay Isaacs CTDB Stories