hydrastor a scalable secondary storage
play

HYDRAstor: a Scalable Secondary Storage 7th USENIX Conference on - PowerPoint PPT Presentation

HYDRAstor: a Scalable Secondary Storage 7th USENIX Conference on File and Storage Technologies (FAST '09) February 26 th 2009 C. Dubnicki, L. Gryz, L. Heldt, M. Kaczmarczyk, W. Kilian, P. Strzelczak, J. Szczepkowski, M. Welnicki C. Ungureanu


  1. HYDRAstor: a Scalable Secondary Storage 7th USENIX Conference on File and Storage Technologies (FAST '09) February 26 th 2009 C. Dubnicki, L. Gryz, L. Heldt, M. Kaczmarczyk, W. Kilian, P. Strzelczak, J. Szczepkowski, M. Welnicki C. Ungureanu

  2. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 2 Scalable secondary storage Characteristics Requirements Huge amount of data - Scalability (dynamic) - Low cost per TB Small backup windows - Very high write performance Duplication between - Global deduplication backup streams Reliable, on-line retrieval - Failure tolerance - High restore performance Varying value of data - Adjust resilience overhead - Data deletion

  3. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 3 Scalable secondary storage Characteristics Requirements Huge amount of data - Scalability (dynamic) - Low cost per TB Small backup windows - Very high write performance Duplication between - Global deduplication backup streams Reliable, on-line retrieval - Failure tolerance - High restore performance Varying value of data - Adjust resilience overhead - Data deletion

  4. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4 Scalable secondary storage Characteristics Requirements Huge amount of data - Scalability (dynamic) - Low cost per TB Small backup windows - Very high write performance Duplication between - Global deduplication backup streams Reliable, on-line retrieval - Failure tolerance - High restore performance Varying value of data - Adjust resilience overhead - Data deletion

  5. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5 Scalable secondary storage Characteristics Requirements Huge amount of data - Scalability (dynamic) - Low cost per TB Small backup windows - Very high write performance Duplication between - Global deduplication backup streams Reliable, on-line retrieval - Failure tolerance - High restore performance Varying value of data - Adjust resilience overhead - Data deletion

  6. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6 Scalable secondary storage Characteristics Requirements Huge amount of data - Scalability (dynamic) - Low cost per TB Small backup windows - Very high write performance Duplication between - Global deduplication backup streams Reliable, on-line retrieval - Failure tolerance - High restore performance Varying value of data - Adjust resilience overhead - Data deletion

  7. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 7 Challenges ● High-performance, decentralized global deduplication ... in a dynamic, distributed system ... with deletion and failures ● Combination introduces complexity ● Tension between: ● Deduplication and dynamic scalability ● Deduplication and on-demand deletion ● Failure tolerance and deletion

  8. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 8 ● Satisfies Scalable secondary storage requirements ● Started as a research project at NEC Laboratories America, in Princeton, NJ ● Successfully commercialized ● Today: real-world, commercial system ● Sold by NEC in the US and Japan ● Development of back-end continues at 9LivesData, LLC in Warsaw, Poland ● Spinoff from NEC Laboratories

  9. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 9 HYDRAstor functionality ● Content addressable storage (CAS) ● Vast data repository ● Storing and extracting streams of blocks ● Single system image built of independent nodes ● Support for standard access methods ● Filesystem, VTL ● Dynamic capacity sharing ● Self-recovery from failures ● On-demand deletion

  10. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 10 Programming Model ● Repository of blocks ● Content-addressed ● Immutable ● Variable-sized hash=011..0

  11. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 11 Programming Model ● Repository of blocks ● Content-addressed ● Immutable ● Variable-sized ● Exposed pointers to other blocks E 011..0 hash=011..0

  12. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 12 Programming Model ● Repository of blocks hash=010..1 Root1 E ● Content-addressed ● Immutable ● Variable-sized ● Exposed pointers to other E E blocks ● Trees of blocks E 011..0 hash=011..0

  13. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 13 Programming Model ● Repository of blocks hash=010..1 Root2 E Root1 E ● Content-addressed ● Immutable hash=110..0 ● Variable-sized ● Exposed pointers to other E E blocks 0 ● Trees of blocks . . 1 1 E 0 ● DAGs due to deduplication 011..0 ● No cycles possible hash=011..0

  14. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 14 Programming Model ● Repository of blocks hash=010..1 Root2 E Root1 E ● Content-addressed ● Immutable hash=110..0 ● Variable-sized ● Exposed pointers to other E E blocks 0 ● Trees of blocks . . 1 1 E 0 ● DAGs due to deduplication 011..0 ● No cycles possible ● Deletion of whole trees hash=011..0

  15. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 15 Programming Model ● Repository of blocks hash=010..1 Root2 E Root1 E ● Content-addressed ● Immutable hash=110..0 ● Variable-sized ● Exposed pointers to other E E blocks 0 ● Trees of blocks . . 1 1 E 0 ● DAGs due to deduplication 011..0 ● No cycles possible ● Deletion of whole trees hash=011..0

  16. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 16 Programming Model ● Repository of blocks hash=010..1 Root2 E Root1 E ● Content-addressed ● Immutable hash=110..0 ● Variable-sized ● Exposed pointers to other E E blocks 0 ● Trees of blocks . . 1 1 E 0 ● DAGs due to deduplication 011..0 ● No cycles possible ● Deletion of whole trees hash=011..0

  17. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 17 Programming Model ● Repository of blocks Root2 E ● Content-addressed ● Immutable hash=110..0 ● Variable-sized ● Exposed pointers to other blocks 0 ● Trees of blocks . . 1 1 E 0 ● DAGs due to deduplication 011..0 ● No cycles possible ● Deletion of whole trees hash=011..0

  18. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 18 Architecture overview ● Standard server-grade hardware running Linux ● Scalability on data-center level NFS / CIFS Front-end Access Nodes Internal Network Back-end (CAS Layer) Storage Nodes

  19. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 19 Data organization: selected requirements Requirements on Required internal scalable storage data services Failure tolerance ● Identify data resilience reduction ● Fast data rebuilding High performance ● Preserve locality of data streams ● Prefetching Dynamic scalability ● Decentralized data management ● Load balancing ● Fast data transfer to new location Deduplication ● Location of potential duplicates ● Availability & resiliency verification On-demand deletion ● Failure-tolerant, distributed deletion

  20. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 20 Data organization: selected requirements Requirements on Required internal scalable storage data services Failure tolerance ● Identify data resilience reduction ● Fast data rebuilding High performance ● Preserve locality of data streams ● Prefetching Dynamic scalability ● Decentralized data management ● Load balancing ● Fast data transfer to new location Deduplication ● Location of potential duplicates ● Availability & resiliency verification On-demand deletion ● Failure-tolerant, distributed deletion

  21. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 21 Data organization: selected requirements Requirements on Required internal scalable storage data services Failure tolerance ● Identify data resilience reduction ● Fast data rebuilding High performance ● Preserve locality of data streams ● Prefetching Dynamic scalability ● Decentralized data management ● Load balancing ● Fast data transfer to new location Deduplication ● Location of potential duplicates ● Availability & resiliency verification On-demand deletion ● Failure-tolerant, distributed deletion

  22. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 22 Data organization: selected requirements Requirements on Required internal scalable storage data services Failure tolerance ● Identify data resilience reduction ● Fast data rebuilding High performance ● Preserve locality of data streams ● Prefetching Dynamic scalability ● Decentralized data management ● Load balancing ● Fast data transfer to new location Deduplication ● Location of potential duplicates ● Availability & resiliency verification On-demand deletion ● Failure-tolerant, distributed deletion

  23. HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 23 Data organization: selected requirements Requirements on Required internal scalable storage data services Failure tolerance ● Identify data resilience reduction ● Fast data rebuilding High performance ● Preserve locality of data streams ● Prefetching Dynamic scalability ● Decentralized data management ● Load balancing ● Fast data transfer to new location Deduplication ● Location of potential duplicates ● Availability & resiliency verification On-demand deletion ● Failure-tolerant, distributed deletion

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend