hydrafs
play

HydraFS: a High-Throughput File System for the HYDRAstor - PowerPoint PPT Presentation

HydraFS: a High-Throughput File System for the HYDRAstor Content-Addressable Storage System Cristian Ungureanu, Benjamin Atkin, Akshat Aranya, Salil Gokhale, Steve Rago, Grzegorz Calkowski, Cezary Dubnicki, Aniruddha Bohra Feb 26, 2010


  1. HydraFS: a High-Throughput File System for the HYDRAstor Content-Addressable Storage System Cristian Ungureanu, Benjamin Atkin, Akshat Aranya, Salil Gokhale, Steve Rago, Grzegorz Calkowski, Cezary Dubnicki, Aniruddha Bohra Feb 26, 2010

  2. HYDRAstor: De-duplicated Scalable Storage FAST’10 • HydraFS: a High Throughput Filesystem  Scale-out storage • Bimodal CDC for Backup Streams  With global de-duplication  Using Content-Defined Chunking • Standard protocols  Resilient to multiple failures • Chunking  Easy to manage (self- healing,…) • High throughput  High throughput for streaming access Access Layer  Std. interfaces (NFS/CIFS, VTL,…) Content-addressable API • Scalable • Easy to manage FAST’09 • Resilient HYDRAstor: a Scalable Secondary Storage • High throughput Content-addressable Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 2

  3. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks B1 File System Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 3

  4. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store File System CA 1 B1 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 4

  5. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store B2 File System CA 1 B1 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 5

  6. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store File System CA 1 CA 2 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 6

  7. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store B3 File System CA 1 CA 2 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 7

  8. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store  Duplicates eliminated by store File System CA 1 CA 1 CA 2 B3 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 8

  9. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store CA 1  Duplicates eliminated by store File System CA 1 CA 2 B3 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 9

  10. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store CA 1 CA 2  Duplicates eliminated by store File System CA 1 B3 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 10

  11. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store CA 1 CA 2 CA 1  Duplicates eliminated by store File System B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 11

  12. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store B4 CA 1 CA 2 CA 1  Duplicates eliminated by store File System  Configurable block resilience B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 12

  13. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store  Duplicates eliminated by store File System  Configurable block resilience CA 3 B4 CA 1 CA 2 CA 1 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 13

  14. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store CA 3  Duplicates eliminated by store File System  Configurable block resilience B4 CA 1 CA 2 CA 1 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 14

  15. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store Root 1 CA 3  Duplicates eliminated by store File System  Configurable block resilience  Garbage collection B4 CA 1 CA 2 CA 1 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 15

  16. HYDRAstor Usage Example Block Store (CAS) API  Variable-size blocks  Content-addressable  Address decided by the store  Duplicates eliminated by store File System  Configurable block resilience  Garbage collection Root 1 CA 3 B4 CA 1 CA 2 CA 1 B1 B2 Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 16

  17. Outline  HYDRAstor content-addressable API  Challenges posed to the filesystem  Filesystem architecture  Techniques used to overcome the challenges  Conclusions and future work FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 17

  18. Challenges  Content-addressable blocks – A change in a block’s contents also changes the block’s address • All metadata has to change, recursively up to the filesystem root • Parent can only be written after the children writes are successful  Variable-sized chunking (splitting file data into blocks) – Block boundaries change when content is changed – Overwrites cause read-rechunk-rewrite  High-latency block store operations – Why? Hashing, compression, erasure coding, fragment distribution … – Exacerbates the above two challenges FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 18

  19. Persistent Layout Filesystem superblock (root block) Inode map root Inode map B-tree Inode map (segmented array) Directory inode File inode Inode B-tree Directory B-tree Directory contents File contents FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 19

  20. HydraFS Architecture User Control messages operations File Server Commit Server Filesystem File System TS=1 TS=20 Block Store Root Metadata Data … TS=3; … TS=2; … Update log TS=1; op 1 , op 2 ,... FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 20

  21. File Server  Write buffer – Accumulates written data; flushed on sync – Helps re-order NFS packets arriving out-of-order Write Buffer (dirty data) FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 21

  22. File Server  Write buffer – Accumulates written data; flushed on sync – Helps re-order NFS packets arriving out-of-order  Chunker – Decides block boundaries (based on data content) Write Buffer (dirty data) Chunker FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 22

  23. File Server  Write buffer – Accumulates written data; flushed on sync – Helps re-order NFS packets arriving out-of-order  Chunker – Decides block boundaries (based on data content)  Metadata modification records (file, directory, inode map) – Dirty metadata annotated with time-stamp (for cleaning) – Written out to log  Requires efficient cleaning – Large amount of dirty metadata!   Resource management issues Write Buffer Metadata Modification Records • File offset_range  CA • Directory additions/removals (dirty data) • Inode map de/allocations Chunker (dirty metadata) FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 23

  24. File Server  Write buffer – Accumulates written data; flushed on sync – Helps re-order NFS packets arriving out-of-order  Chunker – Decides block boundaries (based on data content)  Metadata modification records (file, directory, inode map) – Dirty metadata annotated with time-stamp (for cleaning) – Written out to log  Block cache – Clean data and metadata (not de-serialized) Write Buffer Metadata Modification Records Block Cache • File offset_range  CA • CA  block data • Directory additions/removals (dirty data) • Inode map de/allocations (clean data & Chunker metadata) (dirty metadata) FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 24

  25. Write Processing Write Buffer Metadata Modification Records Block Cache Chunker Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 25

  26. Write Processing [0,8 KB) Write Buffer Metadata Modification Records Block Cache Chunker Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 26

  27. Write Processing Write Buffer Metadata Modification Records Block Cache [0, 8 KB) Chunker Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 27

  28. Write Processing [8 KB,16 KB) Write Buffer Metadata Modification Records Block Cache [0, 8 KB) Chunker Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 28

  29. Write Processing Write Buffer Metadata Modification Records Block Cache [8 KB,16 KB) [0, 8 KB) Chunker Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 29

  30. Write Processing Write Buffer Metadata Modification Records Block Cache [12 KB, 16 KB) Chunker 12 KB of data Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 30

  31. Write Processing Write Buffer Metadata Modification Records Block Cache [12 KB, 16 KB) Chunker CA 1 Data blocks 12 KB of data Block Store FAST 2010 – HydraFS: a High Throughput Filesystem for CAS 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend