cloud filesystem
play

Cloud Filesystem Jeff Darcy for BBLISA, October 2011 What is a - PowerPoint PPT Presentation

Cloud Filesystem Jeff Darcy for BBLISA, October 2011 What is a Filesystem? The thing every OS and language knows Directories, files, file descriptors Directories within directories Operate on single record (POSIX: single


  1. Cloud Filesystem Jeff Darcy for BBLISA, October 2011

  2. What is a Filesystem? • “The thing every OS and language knows” • Directories, files, file descriptors • Directories within directories • Operate on single record (POSIX: single byte) within a file • Built-in permissions model (e.g. UID, GID, ugo·rwx) • Defined concurrency behaviors (e.g. fsync) • Extras: symlinks, ACLs, xattrs

  3. Are Filesystems Relevant? • Supported by every language and OS natively • Shared data with rich semantics • Graceful and efficient handling of multi-GB objects • Permission model missing in some alternatives • Polyglot storage, e.g. DB to index data in FS

  4. Network Filesystems • Extend filesystem to multiple clients • Awesome idea so long as total required capacity/performance doesn't exceed a single server o ...otherwise you get server sprawl • Plenty of commercial vendors, community experience • Making NFS highly available brings extra headaches

  5. Distributed Filesystems • Aggregate capacity/performance across servers • Built-in redundancy o ...but watch out: not all deal with HA transparently • Among the most notoriously difficult kinds of software to set up, tune and maintain o Anyone want to see my Lustre scars? • Performance profile can be surprising • Result: seen as specialized solution (esp. HPC)

  6. Example: NFS4.1/pNFS • pNFS distributes data access across servers • Referrals etc. offload some metadata • Only a protocol, not an implementation o OSS clients, proprietary servers • Does not address metadata scaling at all • Conclusion: partial solution, good for compatibility, full solution might layer on top of something else

  7. Example: Ceph • Two-layer architecture • Object layer (RADOS) is self-organizing o can be used alone for block storage via RBD • Metadata layer provides POSIX file semantics on top of RADOS objects • Full-kernel implementation • Great architecture, some day it will be a great implementation

  8. Ceph Diagram Data Metadata Data Client Metadata Data Data Ceph RADOS Layer Layer

  9. Example: GlusterFS • Single-layer architecture o sharding instead of layering o one type of server – data and metadata • Servers are dumb, smart behavior driven by clients • FUSE implementation • Native, NFSv3, UFO, Hadoop

  10. GlusterFS Diagram Brick A Brick C Data Data Data Data Metadata Metadata Metadata Metadata Client Data Data Data Data Metadata Metadata Metadata Metadata Brick B Brick D

  11. OK, What About HekaFS? • Don't blame me for the name o trademark issues are a distraction from real work • Existing DFSes solve many problems already o sharding, replication, striping • What they don't address is cloud-specific deployment o lack of trust (user/user and user/provider) o location transparency o operationalization

  12. Why Start With GlusterFS? • Not going to write my own from scratch o been there, done that o leverage existing code, community, user base • Modular architecture allows adding functionality via an API o separate licensing, distribution, support • By far the best configuration/management • OK, so it's FUSE o not as bad as people think + add more servers

  13. HekaFS Current Features • Directory isolation • ID isolation o “virtualize” between server ID space and tenants' • SSL o encryption useful on its own o authentication is needed by other features • At-rest encryption o Keys ONLY on clients o AES-256 through AES-1024, “ESSIV-like”

  14. HekaFS Future Features • Enough of multi-tenancy, now for other stuff • Improved (local/sync) replication o lower latency, faster repair • Namespace (and small-file?) caching • Improved data integrity • Improved distribution o higher server counts, smoother reconfiguration • Erasure codes?

  15. HekaFS Global Replication • Multi-site asynchronous • Arbitrary number of sites • Write from any site, even during partition o ordered, eventually consistent with conflict resolution • Caching is just a special case of replication o interest expressed (and withdrawn) not assumed • Some infrastructure being done early for local replication

  16. Project Status • All open source o code hosted by Fedora, bugzilla by Red Hat o Red Hat also pays me (and others) to work on it • Close collaboration with Gluster o they do most of the work o they're open-source folks too o completely support their business model • “current” = Fedora 16 • “future” = Fedora 17+ and Red Hat product

  17. Contact Info • Project • http://hekafs.org • jdarcy@redhat.com • Personal • http://pl.atyp.us • jeff@pl.atyp.us

  18. Cloud Filesystem Jeff Darcy for BBLISA, October 2011

  19. What is a Filesystem? • “The thing every OS and language knows” • Directories, files, file descriptors • Directories within directories • Operate on single record (POSIX: single byte) within a file • Built-in permissions model (e.g. UID, GID, ugo·rwx) • Defined concurrency behaviors (e.g. fsync) • Extras: symlinks, ACLs, xattrs

  20. Are Filesystems Relevant? • Supported by every language and OS natively • Shared data with rich semantics • Graceful and efficient handling of multi-GB objects • Permission model missing in some alternatives • Polyglot storage, e.g. DB to index data in FS

  21. Network Filesystems • Extend filesystem to multiple clients • Awesome idea so long as total required capacity/performance doesn't exceed a single server o ...otherwise you get server sprawl • Plenty of commercial vendors, community experience • Making NFS highly available brings extra headaches

  22. Distributed Filesystems • Aggregate capacity/performance across servers • Built-in redundancy o ...but watch out: not all deal with HA transparently • Among the most notoriously difficult kinds of software to set up, tune and maintain o Anyone want to see my Lustre scars? • Performance profile can be surprising • Result: seen as specialized solution (esp. HPC)

  23. Example: NFS4.1/pNFS • pNFS distributes data access across servers • Referrals etc. offload some metadata • Only a protocol, not an implementation o OSS clients, proprietary servers • Does not address metadata scaling at all • Conclusion: partial solution, good for compatibility, full solution might layer on top of something else

  24. Example: Ceph • Two-layer architecture • Object layer (RADOS) is self-organizing o can be used alone for block storage via RBD • Metadata layer provides POSIX file semantics on top of RADOS objects • Full-kernel implementation • Great architecture, some day it will be a great implementation

  25. C A e t a d a t a C l i e n t R D a e e y a L h p C O r e y a L S M t e D p h D i a g r a m D a t a a a t d a t e M a a t D a t a D a r

  26. Example: GlusterFS • Single-layer architecture o sharding instead of layering o one type of server – data and metadata • Servers are dumb, smart behavior driven by clients • FUSE implementation • Native, NFSv3, UFO, Hadoop

  27. G k t a B r i c d C D a t a M a a t a a M e t a d t t a D a t a M e e a a a k D D a t M i e t a d a t c r d a a t a D a t M B e t a d a t a t D l M n t D a t a e i t a d a t a e l r S u s t e r F C D i a g r a m B i B t M e t a d a a t B r i c k a a c a k A D a t M D e t a d a t a a

  28. OK, What About HekaFS? • Don't blame me for the name o trademark issues are a distraction from real work • Existing DFSes solve many problems already o sharding, replication, striping • What they don't address is cloud-specific deployment o lack of trust (user/user and user/provider) o location transparency o operationalization

  29. Why Start With GlusterFS? • Not going to write my own from scratch o been there, done that o leverage existing code, community, user base • Modular architecture allows adding functionality via an API o separate licensing, distribution, support • By far the best configuration/management • OK, so it's FUSE o not as bad as people think + add more servers

  30. HekaFS Current Features • Directory isolation • ID isolation o “virtualize” between server ID space and tenants' • SSL o encryption useful on its own o authentication is needed by other features • At-rest encryption o Keys ONLY on clients o AES-256 through AES-1024, “ESSIV-like”

  31. HekaFS Future Features • Enough of multi-tenancy, now for other stuff • Improved (local/sync) replication o lower latency, faster repair • Namespace (and small-file?) caching • Improved data integrity • Improved distribution o higher server counts, smoother reconfiguration • Erasure codes?

  32. HekaFS Global Replication • Multi-site asynchronous • Arbitrary number of sites • Write from any site, even during partition o ordered, eventually consistent with conflict resolution • Caching is just a special case of replication o interest expressed (and withdrawn) not assumed • Some infrastructure being done early for local replication

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend