babudb fast and efficient file system metadata storage
play

BabuDB: Fast and Efficient File System Metadata Storage Jan - PowerPoint PPT Presentation

BabuDB: Fast and Efficient File System Metadata Storage Jan Stender, Bjrn Kolbeck, Felix Hupfeld Mikael Hgqvist Zuse Institute Berlin Google GmbH Zurich SNAPI 2010 Jan Stender Motivation Modern parallel / distributed file systems:


  1. BabuDB: Fast and Efficient File System Metadata Storage Jan Stender, Björn Kolbeck, Felix Hupfeld Mikael Högqvist Zuse Institute Berlin Google GmbH Zurich SNAPI 2010 · Jan Stender

  2. Motivation – Modern parallel / distributed file systems: Huge numbers of files and directories – Many storage servers but few metadata servers – – Examples: Lustre, Panasas Active Scale, Google File System – – Metadata access critical wrt. system performance ~75% of all file system calls are metadata accesses – Metadata servers are bottlenecks – SNAPI 2010 · Jan Stender

  3. Motivation – B-tree-like data structures used for metadata storage ZFS, btrfs, Lustre, PVFS2 – – Downsides: Hard to implement and test, – high code complexity Multi-version B-trees even more complex – On-disk re-balancing expensive – SNAPI 2010 · Jan Stender

  4. BabuDB – Key-value store – FS metadata: key-value pairs stored in DB indices SNAPI 2010 · Jan Stender

  5. BabuDB: Index SNAPI 2010 · Jan Stender

  6. Example SNAPI 2010 · Jan Stender

  7. Example: Insertions SNAPI 2010 · Jan Stender

  8. Example: Insertions SNAPI 2010 · Jan Stender

  9. Example: Lookups SNAPI 2010 · Jan Stender

  10. Example: Lookups SNAPI 2010 · Jan Stender

  11. Example: Lookups SNAPI 2010 · Jan Stender

  12. Example: Lookups SNAPI 2010 · Jan Stender

  13. Example: Deletions SNAPI 2010 · Jan Stender

  14. Example: Deletions SNAPI 2010 · Jan Stender

  15. Example: Deletions SNAPI 2010 · Jan Stender

  16. Example: Deletions SNAPI 2010 · Jan Stender

  17. Example: Range Lookups SNAPI 2010 · Jan Stender

  18. Example: Range Lookups SNAPI 2010 · Jan Stender

  19. Example: Range Lookups SNAPI 2010 · Jan Stender

  20. Example: Range Lookups SNAPI 2010 · Jan Stender

  21. Example: Checkpoints SNAPI 2010 · Jan Stender

  22. Example: Checkpoints SNAPI 2010 · Jan Stender

  23. Example: Checkpoints SNAPI 2010 · Jan Stender

  24. Example: Checkpoints SNAPI 2010 · Jan Stender

  25. On-disk Index – Sorted by Keys – Block index in RAM, blocks mmap 'ed SNAPI 2010 · Jan Stender

  26. BabuDB: Related Work – Inspired by log-structured merge trees (LSM-trees) Only one on-disk index – No „rolling merge“ – – Made popular by Google Bigtable Insert/lookup/merge similar as in Bigtable's T ablets – SNAPI 2010 · Jan Stender

  27. BabuDB: Metadata Mapping – Mapping a hierarchical directory tree to a flat database index: SNAPI 2010 · Jan Stender

  28. BabuDB: Advantages – Why BabuDB for File System Metadata? Short-lived files – ▪ 50% of all files deleted within 5 minutes Atomic file system operations w/o locking or transactions – ▪ e.g. rename Directory content in contiguous disk regions – ▪ Efficient readdir + stat Snapshots – ▪ No need for multi-version data structures SNAPI 2010 · Jan Stender

  29. BabuDB: Evaluation – Linux kernel build 2000 1800 1600 ~10M calls: 44% stat , – 1400 seconds 1200 40% open , 15% 1000 BabuDB 800 ext4 readlink , 1% others 600 400 200 0 Kernel build – Dovecot mail server 400 + imaptest 350 300 seconds 250 ~2M calls: 51% stat , – 200 BabuDB ext4 150 48% open , 1% others 100 50 0 Dovecot test SNAPI 2010 · Jan Stender

  30. BabuDB: Evaluation – Listing directory content SNAPI 2010 · Jan Stender

  31. Summary – BabuDB is ... an efficient key-value store – optimized for file system – metadata but also suitable http://babudb.googlecode.com for other purposes suitable for large-scale – databases available for Java and C++ – under BSD license used in the XtreemFS – http://www.xtreemfs.org metadata server SNAPI 2010 · Jan Stender

  32. Thank you for your attention! SNAPI 2010 · Jan Stender

  33. Background: XtreemFS XtreemFS: a distributed replicated Internet file system – part of the XtreemOS research project – developed since 2006 by partners from – Germany, Spain and Italy Object-based – architecture: MRC stores metadata – OSD s store pure file content – as objects Client s provide POSIX file – system interface www.xtreemfs.org SNAPI 2010 · Jan Stender

  34. The XtreemOS Project – Research project funded by the European Commission – 19 partners from Europe and China – XtreemFS is the data management component developed by ZIB, NEC HPC Europe, – Barcelona Supercomputing Center and ICAR-CNR Italy ~ 3 years of development – first public release in August 2008 – SNAPI 2010 · Jan Stender

  35. XtreemFS: Overview – What is XtreemFS? a distributed and replicated – POSIX compliant file system off-the-shelve Servers – no – expensive hardware servers in Java , runs on – Linux / OS X / Solaris client in C , runs on – Linux / OS X / Windows secure (X.509 and SSL) – easy to install and maintain – open source (GPL) – SNAPI 2010 · Jan Stender

  36. File System Landscape Internet Cluster FS/ Data Center Network FS/ Centralized PC ext3, ZFS, NFS, SMB Lustre, Panasas, Grid File System GDM NTFS AFS/Coda GPFS, CEPH... GFarm "gridftp" SNAPI 2010 · Jan Stender

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend