A Few things happened on the way to LMDB Presented by Andrew Bartlet - - PowerPoint PPT Presentation

a few things happened on the way to lmdb
SMART_READER_LITE
LIVE PREVIEW

A Few things happened on the way to LMDB Presented by Andrew Bartlet - - PowerPoint PPT Presentation

A Few things happened on the way to LMDB Presented by Andrew Bartlet Samba Team - Catalyst / / June 2018 Samba is a member project of the Sofuware Freedom Conseraancy Andrew Bartlet and Catalysts Samba Team Samba Deaeloper since 2001


slide-1
SLIDE 1

Presented by Andrew Bartlet Samba Team - Catalyst / / June 2018

A Few things happened on the way to LMDB

Samba is a member project of the Sofuware Freedom Conseraancy

slide-2
SLIDE 2

Andrew Bartlet and Catalyst’s Samba Team

  • Samba Deaeloper since 2001
  • Based in Wellington, New Zealand
  • Team lead for the Catalyst Samba Team,

including:

Garming Sam

Douglas Bagnall

Gary Lockyer

Tim Beale

Joe Guo

Aarron Haslet

Jamie McClymont

slide-3
SLIDE 3

Not really a story about LMDB

  • LMDB prety much did what it said on the tn
  • Instead LMDB taught us about Samba and LDB
  • Numerous locking issues found and fied
  • A new key-aalue layer added
  • And so, so many tests
slide-4
SLIDE 4

Customer request: 64-bit DB

  • Concerned that the 4GB DB could be flled too quickly

Wantng to store g 100,000 users in a single domainn

  • Main concern is the hard limit of TDB
  • LMDB chosen as a modern key-aalue store

Used in OpenLDAP

slide-5
SLIDE 5

Timeline

  • LMDB was prototyped in Jan 2017

Garming Sam

Inspired by, but rewriten from Jakob Hrozek’s earlier prototype

  • GUID-indei LDB implemented in July/August 2017

LMDB requires a maiimum 511 byte key length

  • Primary deaelopment Jan → May 2018

Gary Lockyer

slide-6
SLIDE 6

A new approach: Key/Value layer

  • Jakob’s approach was to copy ldb_tdb and modify it
  • Garming and I decided to instead add a key-aalue layer

Aaoid code duplicaton

Allow more than just LMDB (perhaps LMDBi, LeaelDB)

Share performance and correctness improaements with ldb_tdb

Like dbwrap in concept but specifc to LDB needs

slide-7
SLIDE 7

Key-value API

  • int (*store)(struct ltdb_priaate *ltdb, struct ldb_aal key, struct ldb_aal data, int fags)
  • int (*delete)(struct ltdb_priaate *ltdb, struct ldb_aal key);
  • int (*iterate)(struct ltdb_priaate *ltdb, ldb_ka_traaerse_fn fn, aoid *cti);
  • int (*update_in_iterate)(struct ltdb_priaate *ltdb, struct ldb_aal key, struct ldb_aal key2,

struct ldb_aal data, aoid *cti);

  • int (*fetch_and_parse)(struct ltdb_priaate *ltdb, struct ldb_aal key, int (*parser)(struct

ldb_aal key, struct ldb_aal data, aoid *priaate_data), aoid *cti);

slide-8
SLIDE 8

Locking API

  • Read Locks

int (*lock_read)(struct ldb_module *);

int (*unlock_read)(struct ldb_module *);

  • Transactons

int (*begin_write)(struct ltdb_priaate *);

int (*prepare_write)(struct ltdb_priaate *);

int (*abort_write)(struct ltdb_priaate *);

int (*fjnish_write)(struct ltdb_priaate *);

slide-9
SLIDE 9

Meta API

  • int (*error)(struct ltdb_priaate *ltdb);
  • const char * (*errorstr)(struct ltdb_priaate *ltdb);
  • const char * (*name)(struct ltdb_priaate *ltdb);
  • bool (*has_changed)(struct ltdb_priaate *ltdb);
  • bool (*transacton_actve)(struct ltdb_priaate *ltdb);
slide-10
SLIDE 10

First Hurdle: Locking

  • Eaen the prototype found issuesn

Demonstrated the lack of whole-DB locking

Fiied for Samba 4.7 last year

  • Probably behind many of our replicaton issues
slide-11
SLIDE 11

Second Hurdle: More Locking!

  • It just wouldn’t pass make testn

More strange failures in replicaton

  • Unlock ordering issues in replicaton

highestCommitedUSN aisible before the data

Fiies proposed for backport to Samba 4.7 and 4.8

  • Modifcaton without locks (at startup) in Samba 4.8

DB-init tme only, but not good

Added checks to key-aalue layer to preaent re-occurrence

slide-12
SLIDE 12

Third hurdle: Maximum key size

  • TDB has an unlimited key size
  • LMDB is limited to 511 bytes
  • LDB traditonally used the DN as the key

Addressed by the new GUID key system

slide-13
SLIDE 13

But what about indexes?

  • Indei created by putng indei key and aalue in the TDB key

@INDEX:SAMACCOUNTNAME:abartlet

  • Original plan was to keep the indei in TDB

But the more we understood the locking the less we wanted to mii TDB and LMDB lock ordering

  • Addressed by truncatng the indei and coping with multple matches

Ironically found and fied the 4.8 upgrade bug

But didn’t realise the importance before eaeryone notced

slide-14
SLIDE 14

And what about performance?

  • Three performance tools measured so far:

Make perfuest on our Hardware test seraer

  • Old AMD Athalonn

Trafc replay tool in the cloud

Adding users and users into groups of my workstaton

slide-15
SLIDE 15

Make perfest: a small dissapointment

  • 30% performance lossn

LMDB uses write(), and a read-only mmap()

socket_wrapper intercepts write()

  • Workaround:

Use Linui userspace namespaces instead of socket_wrapper

Patches to upstream this stll pending

  • End result is no major change, perhaps 10% slower
slide-16
SLIDE 16

Trafc replay

  • This is a tool to replay an amplifed anonymous trafc capture
  • Similar numbers to TDB
  • Need to re-try with a larger DB

We think LMDB will show most strength at large sizes

slide-17
SLIDE 17

Adding users and users into groups of my workstaton

  • In a four-hour benchmark adding users and adding to one to four groups (in rotaton):

Samba 4.4: 26,000 users

Samba 4.5: 48,000 users

Samba 4.6: 55,000 users

Samba 4.7: 85,000 users

Samba 4.8: 100,000 users

Samba 4.9: 100,000 users (TDB)

Samba 4.9: 45,000 users (LMDB)

slide-18
SLIDE 18
  • Ouch. What went wrong!
  • fsync()/fdatasync()/msync() stll called
  • Patches quickly writen
  • New numbers:

Samba 4.9: 100,000 users (TDB)

Samba 4.9: 124,000 users (LMDB, no fsync())

  • Lesson:

Samba’s module stack is stll the slowest factor

slide-19
SLIDE 19

TDB vs LMDB (latency vs number of users added)

slide-20
SLIDE 20

Flame Graph

Search do.. replmd_.. ld.. sys.. std_e.. _tc_fre.. lt.. ltdb_search_and_return_base ltdb_se.. ldb_wait lmdb.. __p.. ldb_next_request _tc_fre.. extended_callback p.. ldb_module_done ltdb_s.. extended_callback_ldb _int_f.. _tc_fre.. ldb_w.. ltd.. _tc_fre.. ltd.. lt.. re.. std_ev.. new.. ldb_wait ldb_module_done ltdb_modify_i.. _tc_fre.. dsdb.. extended_replace_callback _tc_fre.. ldb_next_request ltdb.. _tc_fre.. de.. lt.. d.. _tevent.. ltdb_search std_.. _teve.. ldb_next_request h.. samldb_modify ltdb.. ldb_next_request lt.. es_callback _tc.. _.. pag.. td.. _talloc.. extended_callback_ldb _tc_fre.. __.. lt.. ldb_next_request ldb_next_request sy .. teve.. rdn_name_modify partition_req_callback _tc_fre.. ldb_next_request std_event_loop_once e.. ldb_module_send_entry _t.. ltdb_.. epoll.. ltdb.. _tc_fre.. _tc_fre.. _tc_fre.. ltdb_.. lt.. _teven.. epoll_e.. partition_req_callback dsdb_dn_parse_trusted __.. ldb_module_done ltdb_ca.. __.. s.. dsdb_next_callback _tc_fre.. lt.. do_.. password_hash_nee.. lm.. _tc.. ext.. _tc_fre.. [unknown] lt.. _talloc.. _tc_f.. epoll_.. _tc_fre.. ltdb_.. lt.. extended_dn_in_modify lt.. ld.. ltdb_.. dsdb_m.. py_ldb_modify PyEval_EvalFrameEx do.. ltdb_modify tombstone_reanimate_modify _tc_fre.. li.. ltdb.. tevent.. ent.. ld.. __.. lt.. ld.. pa.. un.. attr_handler ltdb_c.. ldb_module_done ltd.. lm.. epoll_event_loop_once _tc_fre.. ldb_next_request ltdb_se.. saml..

  • perational_callback

lt.. d.. lt.. td.. ldb_next_request ld.. extended_callback ldb_next_request ldb_next_request teven.. _tc..

  • bjectclass_modify

dsdb.. __GI.. lt.. _tc_fre.. acl_modify d.. lt.. __m.. _tc_fre.. ltdb_request_done lt.. lt.. ld.. d.. dsdb_mo.. log_modify _tal.. ldb_wait _tc_fre.. _.. __me.. [unknown] lt.. py_ldb_add ltdb.. ltdb_callback ltdb_s.. ld.. python ldb_next_request lt.. password_hash_mod.. ltdb.. ldb_dn_fr.. ge.. ldb_.. epol.. _t.. instancetype_mod _tevent_loop_once std_eve.. ldb_module_send_entry ldb_module_done

  • bjectclass_attrs..

lt.. lm.. tevent_common_loop_timer_delay ldb_module_send_entry descriptor_modify _.. replmd_m.. ltdb_.. _int_mal.. _tc_fre.. _tal.. lmdb_.. tevent_.. _tc.. __memmo.. replmd_modify extended_dn_in_fjx _tc_fre.. _tc_fre.. _t.. _tev.. dsdb_next_callback vfs.. en.. _tc_fre.. lo..

slide-21
SLIDE 21

OK, so not so bad

  • We addressed the customer’s desire for scale

Currently limited to 6GB but that is compile-tme constant only

  • Opens up new opportunites
  • Current Status:

Stll accessed behind sam.ldb (a TDB)

Stll one-subtree per DB fle

slide-22
SLIDE 22

LMDB: The future

  • Use the LMDB b-tree propertes in for the indei

Allow prefi matching

Allow <= etc

  • Use sub-databases:

Perhaps one per indei?

Perhaps one per sub-tree?

  • Use nested transactons to make the indei safer
slide-23
SLIDE 23

LMDB: Sharp Edges

  • Difgerent locking behaaiour (no eiclusiae access)
  • Files are sparse by default

DB operatons can fll the fle and partton without going aia a specifc resize

  • Files are not eitended automatcally

The inaerse to the aboae, when a fle is full unlike TDB there is no auto-resize

Requires that the admin or Samba know the size up-front

  • LDB / Samba has not required this kind of planning in the past
  • Need real-world eiperience
slide-24
SLIDE 24

Beyond LMDB: Supportng more connectons on each DC

  • Samba 4.6 remoaes single-process restrictons on NETLOGON

Really important for 802.1i backed authentcaton

  • Samba 4.7 supports a mult-process LDAP seraer

Actually reduces number of connectons you can ft in memory (oops)

  • Samba 4.8 adds a prefork mode for LDAP

Great for a big AD DC with many, many clients

  • Samba 4.9: should we make prefork the default?

Howeaer NETLOGON would be single-process in that mode (ouch)

slide-25
SLIDE 25

New trafc_replay Performance tool

  • We can now record and re-play trafc

Recreate a real-world load

Amplify the trafc

  • Comparatae testng now possible between Windows and Samba
  • Samba is now within about 50% of Windows performance

Against a single-CPU target system

Allows us to slow both down enough for the trafc_replay to saturate it

slide-26
SLIDE 26

Performance against Samba and Windows

  • 1 aCPU
  • Catalyst Cloud
  • Afuer 35i speed the tool

eihausts itself

  • So this is not the upper bound
slide-27
SLIDE 27

Wanted: More network captures

  • We need more sample of network trafc

Anonymised with the trafc_summary.pl script

  • Ideally with permission to publish (eg the Samba wiki)
  • Diaerse real world loads will aaoid skewed perf testng
slide-28
SLIDE 28

make perfest graphs

  • April 2016

to Dec 2017

slide-29
SLIDE 29

Catalyst development beyond performance

  • Encrypted secrets (4.8)

Use a local fle key to encrypt DB secrets

(could then be network-deployed)

  • Unii-compatble passwords (4.7)

Store and retrieae sha256/sha512 crypt() passwords to sync with other systems

  • Audit Logging (4.7 and 4.9)

Output audit logs into JSON

  • RODC support (4.7)

This was eiperimental untl now

  • DNS Zone scaaenging (for 4.9)
  • Demote cleans up own DNS records (4.9)
  • Fine Grained Password Policy (4.9)
  • Domain Backup (for 4.9)
  • Domain Rename (for 4.9)
slide-30
SLIDE 30

Catalyst's Open Source Technologies – Questons?

Want to work with my team at Catalyst to make your Samba scale? - talk to me in the hallway trackn