Presented by Andrew Bartlet Samba Team - Catalyst / / June 2018
A Few things happened on the way to LMDB Presented by Andrew Bartlet - - PowerPoint PPT Presentation
A Few things happened on the way to LMDB Presented by Andrew Bartlet - - PowerPoint PPT Presentation
A Few things happened on the way to LMDB Presented by Andrew Bartlet Samba Team - Catalyst / / June 2018 Samba is a member project of the Sofuware Freedom Conseraancy Andrew Bartlet and Catalysts Samba Team Samba Deaeloper since 2001
Andrew Bartlet and Catalyst’s Samba Team
- Samba Deaeloper since 2001
- Based in Wellington, New Zealand
- Team lead for the Catalyst Samba Team,
including:
–
Garming Sam
–
Douglas Bagnall
–
Gary Lockyer
–
Tim Beale
–
Joe Guo
–
Aarron Haslet
–
Jamie McClymont
Not really a story about LMDB
- LMDB prety much did what it said on the tn
- Instead LMDB taught us about Samba and LDB
- Numerous locking issues found and fied
- A new key-aalue layer added
- And so, so many tests
Customer request: 64-bit DB
- Concerned that the 4GB DB could be flled too quickly
–
Wantng to store g 100,000 users in a single domainn
- Main concern is the hard limit of TDB
- LMDB chosen as a modern key-aalue store
–
Used in OpenLDAP
Timeline
- LMDB was prototyped in Jan 2017
–
Garming Sam
–
Inspired by, but rewriten from Jakob Hrozek’s earlier prototype
- GUID-indei LDB implemented in July/August 2017
–
LMDB requires a maiimum 511 byte key length
- Primary deaelopment Jan → May 2018
–
Gary Lockyer
A new approach: Key/Value layer
- Jakob’s approach was to copy ldb_tdb and modify it
- Garming and I decided to instead add a key-aalue layer
–
Aaoid code duplicaton
–
Allow more than just LMDB (perhaps LMDBi, LeaelDB)
–
Share performance and correctness improaements with ldb_tdb
–
Like dbwrap in concept but specifc to LDB needs
Key-value API
- int (*store)(struct ltdb_priaate *ltdb, struct ldb_aal key, struct ldb_aal data, int fags)
- int (*delete)(struct ltdb_priaate *ltdb, struct ldb_aal key);
- int (*iterate)(struct ltdb_priaate *ltdb, ldb_ka_traaerse_fn fn, aoid *cti);
- int (*update_in_iterate)(struct ltdb_priaate *ltdb, struct ldb_aal key, struct ldb_aal key2,
struct ldb_aal data, aoid *cti);
- int (*fetch_and_parse)(struct ltdb_priaate *ltdb, struct ldb_aal key, int (*parser)(struct
ldb_aal key, struct ldb_aal data, aoid *priaate_data), aoid *cti);
Locking API
- Read Locks
–
int (*lock_read)(struct ldb_module *);
–
int (*unlock_read)(struct ldb_module *);
- Transactons
–
int (*begin_write)(struct ltdb_priaate *);
–
int (*prepare_write)(struct ltdb_priaate *);
–
int (*abort_write)(struct ltdb_priaate *);
–
int (*fjnish_write)(struct ltdb_priaate *);
Meta API
- int (*error)(struct ltdb_priaate *ltdb);
- const char * (*errorstr)(struct ltdb_priaate *ltdb);
- const char * (*name)(struct ltdb_priaate *ltdb);
- bool (*has_changed)(struct ltdb_priaate *ltdb);
- bool (*transacton_actve)(struct ltdb_priaate *ltdb);
First Hurdle: Locking
- Eaen the prototype found issuesn
–
Demonstrated the lack of whole-DB locking
–
Fiied for Samba 4.7 last year
- Probably behind many of our replicaton issues
Second Hurdle: More Locking!
- It just wouldn’t pass make testn
–
More strange failures in replicaton
- Unlock ordering issues in replicaton
–
highestCommitedUSN aisible before the data
–
Fiies proposed for backport to Samba 4.7 and 4.8
- Modifcaton without locks (at startup) in Samba 4.8
–
DB-init tme only, but not good
–
Added checks to key-aalue layer to preaent re-occurrence
Third hurdle: Maximum key size
- TDB has an unlimited key size
- LMDB is limited to 511 bytes
- LDB traditonally used the DN as the key
–
Addressed by the new GUID key system
But what about indexes?
- Indei created by putng indei key and aalue in the TDB key
–
@INDEX:SAMACCOUNTNAME:abartlet
- Original plan was to keep the indei in TDB
–
But the more we understood the locking the less we wanted to mii TDB and LMDB lock ordering
- Addressed by truncatng the indei and coping with multple matches
–
Ironically found and fied the 4.8 upgrade bug
–
But didn’t realise the importance before eaeryone notced
And what about performance?
- Three performance tools measured so far:
–
Make perfuest on our Hardware test seraer
- Old AMD Athalonn
–
Trafc replay tool in the cloud
–
Adding users and users into groups of my workstaton
Make perfest: a small dissapointment
- 30% performance lossn
–
LMDB uses write(), and a read-only mmap()
–
socket_wrapper intercepts write()
- Workaround:
–
Use Linui userspace namespaces instead of socket_wrapper
–
Patches to upstream this stll pending
- End result is no major change, perhaps 10% slower
Trafc replay
- This is a tool to replay an amplifed anonymous trafc capture
- Similar numbers to TDB
- Need to re-try with a larger DB
–
We think LMDB will show most strength at large sizes
Adding users and users into groups of my workstaton
- In a four-hour benchmark adding users and adding to one to four groups (in rotaton):
–
Samba 4.4: 26,000 users
–
Samba 4.5: 48,000 users
–
Samba 4.6: 55,000 users
–
Samba 4.7: 85,000 users
–
Samba 4.8: 100,000 users
–
Samba 4.9: 100,000 users (TDB)
–
Samba 4.9: 45,000 users (LMDB)
- Ouch. What went wrong!
- fsync()/fdatasync()/msync() stll called
- Patches quickly writen
- New numbers:
–
Samba 4.9: 100,000 users (TDB)
–
Samba 4.9: 124,000 users (LMDB, no fsync())
- Lesson:
–
Samba’s module stack is stll the slowest factor
TDB vs LMDB (latency vs number of users added)
Flame Graph
Search do.. replmd_.. ld.. sys.. std_e.. _tc_fre.. lt.. ltdb_search_and_return_base ltdb_se.. ldb_wait lmdb.. __p.. ldb_next_request _tc_fre.. extended_callback p.. ldb_module_done ltdb_s.. extended_callback_ldb _int_f.. _tc_fre.. ldb_w.. ltd.. _tc_fre.. ltd.. lt.. re.. std_ev.. new.. ldb_wait ldb_module_done ltdb_modify_i.. _tc_fre.. dsdb.. extended_replace_callback _tc_fre.. ldb_next_request ltdb.. _tc_fre.. de.. lt.. d.. _tevent.. ltdb_search std_.. _teve.. ldb_next_request h.. samldb_modify ltdb.. ldb_next_request lt.. es_callback _tc.. _.. pag.. td.. _talloc.. extended_callback_ldb _tc_fre.. __.. lt.. ldb_next_request ldb_next_request sy .. teve.. rdn_name_modify partition_req_callback _tc_fre.. ldb_next_request std_event_loop_once e.. ldb_module_send_entry _t.. ltdb_.. epoll.. ltdb.. _tc_fre.. _tc_fre.. _tc_fre.. ltdb_.. lt.. _teven.. epoll_e.. partition_req_callback dsdb_dn_parse_trusted __.. ldb_module_done ltdb_ca.. __.. s.. dsdb_next_callback _tc_fre.. lt.. do_.. password_hash_nee.. lm.. _tc.. ext.. _tc_fre.. [unknown] lt.. _talloc.. _tc_f.. epoll_.. _tc_fre.. ltdb_.. lt.. extended_dn_in_modify lt.. ld.. ltdb_.. dsdb_m.. py_ldb_modify PyEval_EvalFrameEx do.. ltdb_modify tombstone_reanimate_modify _tc_fre.. li.. ltdb.. tevent.. ent.. ld.. __.. lt.. ld.. pa.. un.. attr_handler ltdb_c.. ldb_module_done ltd.. lm.. epoll_event_loop_once _tc_fre.. ldb_next_request ltdb_se.. saml..
- perational_callback
lt.. d.. lt.. td.. ldb_next_request ld.. extended_callback ldb_next_request ldb_next_request teven.. _tc..
- bjectclass_modify
dsdb.. __GI.. lt.. _tc_fre.. acl_modify d.. lt.. __m.. _tc_fre.. ltdb_request_done lt.. lt.. ld.. d.. dsdb_mo.. log_modify _tal.. ldb_wait _tc_fre.. _.. __me.. [unknown] lt.. py_ldb_add ltdb.. ltdb_callback ltdb_s.. ld.. python ldb_next_request lt.. password_hash_mod.. ltdb.. ldb_dn_fr.. ge.. ldb_.. epol.. _t.. instancetype_mod _tevent_loop_once std_eve.. ldb_module_send_entry ldb_module_done
- bjectclass_attrs..
lt.. lm.. tevent_common_loop_timer_delay ldb_module_send_entry descriptor_modify _.. replmd_m.. ltdb_.. _int_mal.. _tc_fre.. _tal.. lmdb_.. tevent_.. _tc.. __memmo.. replmd_modify extended_dn_in_fjx _tc_fre.. _tc_fre.. _t.. _tev.. dsdb_next_callback vfs.. en.. _tc_fre.. lo..
OK, so not so bad
- We addressed the customer’s desire for scale
–
Currently limited to 6GB but that is compile-tme constant only
- Opens up new opportunites
- Current Status:
–
Stll accessed behind sam.ldb (a TDB)
–
Stll one-subtree per DB fle
LMDB: The future
- Use the LMDB b-tree propertes in for the indei
–
Allow prefi matching
–
Allow <= etc
- Use sub-databases:
–
Perhaps one per indei?
–
Perhaps one per sub-tree?
- Use nested transactons to make the indei safer
LMDB: Sharp Edges
- Difgerent locking behaaiour (no eiclusiae access)
- Files are sparse by default
–
DB operatons can fll the fle and partton without going aia a specifc resize
- Files are not eitended automatcally
–
The inaerse to the aboae, when a fle is full unlike TDB there is no auto-resize
–
Requires that the admin or Samba know the size up-front
- LDB / Samba has not required this kind of planning in the past
- Need real-world eiperience
Beyond LMDB: Supportng more connectons on each DC
- Samba 4.6 remoaes single-process restrictons on NETLOGON
–
Really important for 802.1i backed authentcaton
- Samba 4.7 supports a mult-process LDAP seraer
–
Actually reduces number of connectons you can ft in memory (oops)
- Samba 4.8 adds a prefork mode for LDAP
–
Great for a big AD DC with many, many clients
- Samba 4.9: should we make prefork the default?
–
Howeaer NETLOGON would be single-process in that mode (ouch)
New trafc_replay Performance tool
- We can now record and re-play trafc
–
Recreate a real-world load
–
Amplify the trafc
- Comparatae testng now possible between Windows and Samba
- Samba is now within about 50% of Windows performance
–
Against a single-CPU target system
–
Allows us to slow both down enough for the trafc_replay to saturate it
Performance against Samba and Windows
- 1 aCPU
- Catalyst Cloud
- Afuer 35i speed the tool
eihausts itself
- So this is not the upper bound
Wanted: More network captures
- We need more sample of network trafc
–
Anonymised with the trafc_summary.pl script
- Ideally with permission to publish (eg the Samba wiki)
- Diaerse real world loads will aaoid skewed perf testng
make perfest graphs
- April 2016
to Dec 2017
Catalyst development beyond performance
- Encrypted secrets (4.8)
–
Use a local fle key to encrypt DB secrets
–
(could then be network-deployed)
- Unii-compatble passwords (4.7)
–
Store and retrieae sha256/sha512 crypt() passwords to sync with other systems
- Audit Logging (4.7 and 4.9)
–
Output audit logs into JSON
- RODC support (4.7)
–
This was eiperimental untl now
- DNS Zone scaaenging (for 4.9)
- Demote cleans up own DNS records (4.9)
- Fine Grained Password Policy (4.9)
- Domain Backup (for 4.9)
- Domain Rename (for 4.9)