HYDRAstor: a Scalable Secondary Storage 7th TF-Storage Meeting - - PowerPoint PPT Presentation
HYDRAstor: a Scalable Secondary Storage 7th TF-Storage Meeting - - PowerPoint PPT Presentation
HYDRAstor: a Scalable Secondary Storage 7th TF-Storage Meeting September 9 th 2010 ukasz Heldt Largest Japanese IT company Polish R&D company $43 Billion in annual revenue 50 engineers and scientists 143,000 staff www.9livesdata.com
Largest Japanese IT company $43 Billion in annual revenue 143,000 staff www.nec.com Polish R&D company 50 engineers and scientists www.9livesdata.com Scalable disk based storage for backup with global deduplication Started in 2003 in NEC Labs by Cezary Dubnicki 2007 Product of the year award by SearchStorage.com 2008 Product innovation award by Network Products Guide 2009/2010 FAST conference publication in San Jose Sold in US and Japan since 2007 Will be sold in Poland in 2011 by 9LivesData in coop. with NEC
R&D of critical backend component Owns & sells
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 3
Backup storage
- Tapes are most common, despite:
- Sensitive environment requirements
- Unreliable restore
- Low performance
- Manual labor or expensive robots
- Problematic replication
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4
Backup storage size
- Usual backup policy
- 4-12+ full backups
- 7-30+ incremental
- Majority of data does
not change
- Data compression 2:1
- Secondary storage
size:
- 5x-20x more than
primary storage
- Includes many copies
- f the same data
- Each data chunk
stored 5-10+ times
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5
Backup storage size
- Usual backup policy
- 4-12+ full backups
- 7-30+ incremental
- Majority of data does
not change
- Data compression 2:1
- Secondary storage
size:
- 5x-20x more than
primary storage
- Includes many copies
- f the same data
- Each data chunk
stored 5-10+ times
High potential for the deduplication technology.
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6
Deduplication
- Save disk space by eliminating duplicates
- Sample reduction ratio 10:1 (depends on backup policy)
- Lowers price of gigabyte
B C A D E A
File A File B
B C A
File A
Sub-file level deduplication
B C D A E
Stored blocks Only unique blocks Only unique blocks are stored are stored
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 7
Global deduplication
- Prevent silos of deduped data
- One system to manage
Global vs. siloed dedup
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 8
HYDRAstor product
- Provides
- global deduplication using DataRedux™
- performance, storage scalability
and data resiliency using Distributed Resilient Data™
9
HYDRAstor deployment
- Interface: CIFS, NFS, Symantec OST
- Marker filtering for: Tivoli, Netbackup, Networker, CommVault
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 10
HYDRAstor architecture
- Accelerator Nodes realize performance
- Storage Nodes realize capacity
Internal Network Accelerator Nodes Storage Nodes NFS / CIFS / OST
- ver Ethernet
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 11
HYDRAstor architecture
- Accelerator Nodes realize performance
- Storage Nodes realize capacity
Internal Network Accelerator Nodes Storage Nodes NFS / CIFS / OST
- ver Ethernet
Non-disruptive grid expansion
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 12
HYDRAstor scalability
- MiniHYDRA – single server
- Storage: 12 TB – 240 TB*
- Performance: 1.3 TB / hour
- 2AN 4SN
- Storage: 48 TB – 960 TB*
- Performance: 3.6 TB / hour
- 20AN 40SN (4 racks)
- Storage: 480 TB – 9600 TB*
- Performance: 36 TB / hour
* - assuming 20x data reduction through DataRedux™
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 13
HYDRAstor scalability
- Slide from Curtis Preston presentation
Curtis Preston is a famous storage analyst owning independent consulting company
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 14
HYDRAstor other features
- Fully automatic/non-disruptive mgmt
- Recovery of lost data resiliency
- Periodic data scrubbing
- Machine and disk failure recovery
- Configurable redundancy level
- erasure coding – better than RAID6
- Optimized replication
- Smart resource management
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 15
HYDRAstor backend design
Details of the design: http://www.usenix.org/events/fast09/tech/full_papers/dubnicki/dubnicki.pdf
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 16
Programming Model
- Repository of blocks
- Content-addressed
- Immutable
- Variable-sized
hash=011..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 17
Programming Model
- Repository of blocks
- Content-addressed
- Immutable
- Variable-sized
- Exposed pointers to other
blocks
E hash=011..0 011..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 18
Programming Model
- Repository of blocks
- Content-addressed
- Immutable
- Variable-sized
- Exposed pointers to other
blocks
- Trees of blocks
E E E Root1 E hash=010..1 hash=011..0 011..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 19
Programming Model
- Repository of blocks
- Content-addressed
- Immutable
- Variable-sized
- Exposed pointers to other
blocks
- Trees of blocks
- DAGs due to deduplication
- No cycles possible
E E 011..0 E Root1 E E Root2 hash=010..1 hash=110..0 hash=011..0 1 1 . .
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 20
Programming Model
- Repository of blocks
- Content-addressed
- Immutable
- Variable-sized
- Exposed pointers to other
blocks
- Trees of blocks
- DAGs due to deduplication
- No cycles possible
- Deletion of whole trees
E E 1 1 . . E Root1 E E Root2 hash=010..1 hash=110..0 hash=011..0 011..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 21
Programming Model
- Repository of blocks
- Content-addressed
- Immutable
- Variable-sized
- Exposed pointers to other
blocks
- Trees of blocks
- DAGs due to deduplication
- No cycles possible
- Deletion of whole trees
E E 1 1 . . E Root1 E E Root2 hash=010..1 hash=110..0 hash=011..0 011..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 22
Programming Model
- Repository of blocks
- Content-addressed
- Immutable
- Variable-sized
- Exposed pointers to other
blocks
- Trees of blocks
- DAGs due to deduplication
- No cycles possible
- Deletion of whole trees
E E 1 1 . . E Root1 E E Root2 hash=010..1 hash=110..0 hash=011..0 011..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 23
Programming Model
- Repository of blocks
- Content-addressed
- Immutable
- Variable-sized
- Exposed pointers to other
blocks
- Trees of blocks
- DAGs due to deduplication
- No cycles possible
- Deletion of whole trees
E 1 1 . . E Root2 hash=110..0 hash=011..0 011..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 24
Failure tolerance: erasure coding
Decode Any 3 fragments can be lost
Example: N=8, m=5
Encode
Original block
O r i g i n a l F r a g m e n t s R e d u n d a n t F r a g m e n t s
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 25
Failure tolerance: erasure coding
Decode Any 3 fragments can be lost
Example: N=8, m=5
Encode
Original block
O r i g i n a l F r a g m e n t s R e d u n d a n t F r a g m e n t s
Mirror 3-copy RAID6 Erasure coding Resiliency 1 2 2 2 3 Overhead 100% 200% 20% 20% 33%
Assuming 12 disks array
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 26
Scalability with DHT: data placement
- Block location: DHT with prefix routing
1 01 10 11 empty prefix 00 01
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 27
Scalability with DHT: data placement
- Block location: DHT with prefix routing
- Block mapped to hash prefix
hash=011..0
1 01 10 11 empty prefix 00
Block
01
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 28
Scalability with DHT: data placement
- Block location: DHT with prefix routing
- Block mapped to hash prefix
- Prefix components
- Hosted on SNs
- N components
per prefix
hash=011..0 Block
Node 1 Node 6 Node 1 Node 5 Node 1 Node 4 Node 1 Node 3 Node 1 Node 2 Node 1 Node 1
1
1 3 2 1 2 3 1 2 3 2 3 1
01 10 11 empty prefix 00
N=4
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 29
Scalability with DHT: data placement
hash=011..0 Block
Node 1 Node 6 Node 1 Node 5 Node 1 Node 4 Node 1 Node 3 Node 1 Node 2 Node 1 Node 1
1
1 3 2 1 2 3 1 2 3 2 3 1
01 10 11 empty prefix 00
N=4
- Block location: DHT with prefix routing
- Block mapped to hash prefix
- Prefix components
- Hosted on SNs
- N components
per prefix
- Store fragments
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 30
Scalability with DHT: data placement
hash=011..0 Block
Node 1 Node 6 Node 1 Node 5 Node 1 Node 4 Node 1 Node 3 Node 1 Node 2 Node 1 Node 1
1
1 3 2 1 2 3 1 2 3 2 3 1
01 10 11 empty prefix 00
N=4
- Block location: DHT with prefix routing
- Block mapped to hash prefix
- Prefix components
- Hosted on SNs
- N components
per prefix
- Store fragments
- Distributed
consensus
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 31
Scalability with DHT: data placement
hash=011..0 Block
Node 1 Node 6 Node 1 Node 5 Node 1 Node 4 Node 1 Node 3 Node 1 Node 2 Node 1 Node 1
1
1 3 2 1 2 3 1 2 3 2 3 1
01 10 11 empty prefix 00
N=4
- Block location: DHT with prefix routing
- Block mapped to hash prefix
- Prefix components
- Hosted on SNs
- N components
per prefix
- Store fragments
- Distributed
consensus
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 32
Scalability with DHT: data placement
hash=011..0 Block
Node 1 Node 6 Node 1 Node 5 Node 1 Node 4 Node 1 Node 3 Node 1 Node 2 Node 1 Node 1
1
1 3 2 1 2 3 1 2 3 2 3 1
01 10 11 empty prefix 00
N=4
- Block location: DHT with prefix routing
- Block mapped to hash prefix
- Prefix components
- Hosted on SNs
- N components
per prefix
- Store fragments
- Distributed
consensus
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 33
Scalability with DHT: data placement
hash=011..0 Block
Node 1 Node 6 Node 1 Node 5 Node 1 Node 4 Node 1 Node 1 Node 3 Node 2 Node 1 Node 1
1
1 3 2 1 2 3 1 2 3 2 3 1
01 10 11 empty prefix 00
N=4
- Block location: DHT with prefix routing
- Block mapped to hash prefix
- Prefix components
- Hosted on SNs
- N components
per prefix
- Store fragments
- Distributed
consensus
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 34
Scalability with DHT: data placement
hash=011..0 Block
Node 1 Node 6 Node 1 Node 5 Node 1 Node 4 Node 1 Node 3 Node 1 Node 2 Node 1 Node 1
1
1 3 2 1 2 3 1 2 3 2 3 1
01 10 11 empty prefix 00
N=4
- Block location: DHT with prefix routing
- Block mapped to hash prefix
- Prefix components
- Hosted on SNs
- N components
per prefix
- Store fragments
- Distributed
consensus
- Load balancing
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 35
Data organization: synchrun chains
A B E C D F G
- Data stream split to blocks
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 36
Data organization: synchrun chains
A B E C D F G
Hash 010… Hash 101… Hash 110… Hash 011… Hash 000… Hash 011… Hash 100…
- Data stream split to blocks
- Hashes of blocks computed
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 37
Data organization: synchrun chains
A B E C D F G
Hash 010… Hash 101… Hash 110… Hash 011… Hash 000… Hash 011… Hash 100…
- Data stream split to blocks
- Hashes of blocks computed
- Routing through DHT
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 38
Data organization: synchrun chains
A B E C D F G
Hash 010… Hash 101… Hash 110… Hash 011… Hash 000… Hash 011… Hash 100…
- Data stream split to blocks
- Hashes of blocks computed
- Routing through DHT
Prefix 01
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 39
Data organization: synchrun chains
A B E C D F G
Hash 010… Hash 101… Hash 110… Hash 011… Hash 000… Hash 011… Hash 100…
Erasure Coding Compression
- Data stream split to blocks
- Hashes of blocks computed
- Routing through DHT
Prefix 01
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 40
Data organization: synchrun chains
A B E C D F G
Hash 010… Hash 101… Hash 110… Hash 011… Hash 000… Hash 011… Hash 100…
Prefix 01 Erasure Coding Compression
- Data stream split to blocks
- Hashes of blocks computed
- Routing through DHT
Component Component
1
Component
2
Component
3
- Erasure-coded fragments
stored by components
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 41
Data organization: synchrun chains
A B E C D F G
Hash 010… Hash 101… Hash 110… Hash 011… Hash 000… Hash 011… Hash 100…
Erasure Coding Compression
- Data stream split to blocks
- Hashes of blocks computed
- Routing through DHT
A D F A D F A D F A D F
Component Component
1
Component
2
Component
3 Prefix 01
- Erasure-coded fragments
stored by components
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 42
Data organization: synchrun chains
A B E C D F G
Hash 010… Hash 101… Hash 110… Hash 011… Hash 000… Hash 011… Hash 100…
Synchrun 1 Synchrun 2 Synchrun 3
Prefix 01 Erasure Coding Compression
Synchrun
- Data stream split to blocks
- Hashes of blocks computed
- Routing through DHT
Component Component
1
Component
2
Component
3
- Erasure-coded fragments
stored by components
- Grouped into synchruns
A D F A D F A D F A D F
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 43
Data organization: synchrun chains
A B E C D F G
Hash 010… Hash 101… Hash 110… Hash 011… Hash 000… Hash 011… Hash 100…
Synchrun 1 Synchrun 2 Synchrun 3
Prefix 01 Erasure Coding Compression
- Data stream split to blocks
- Hashes of blocks computed
- Routing through DHT
Component Component
1
Component
2
Component
3 Container
- Erasure-coded fragments
stored by components
- Grouped into synchruns
- Containers stored on disks
- Fragment metadata
separately from data Synchrun
A D F A D F A D F A D F
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 44
Data organization: synchrun chains
A B E C D F G
Hash 010… Hash 101… Hash 110… Hash 011… Hash 000… Hash 011… Hash 100…
Synchrun 1 Synchrun 2 Synchrun 3
Erasure Coding Compression
- Data stream split to blocks
- Hashes of blocks computed
- Routing through DHT
A D F A D F A D F A D F
Component Component
1
Component
2
Component
3 Prefix 01
- Erasure-coded fragments
stored by components
- Grouped into synchruns
- Containers stored on disks
- Fragment metadata
separately from data
- Ordered synchrun chains
- Preserve order & locality
- Manageable
Container
Synchrun
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 45 Component
01:0
Component
01:1
Component
01:2
Component
01:3
Data Services: Identification of data resiliency level
Missing fragments
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 46
Data Services: Identification of data resiliency level
Component
01:0
Component
01:1
Component
01:2
Component
01:3
Chain scanning
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 47
Data Services: Identification of data resiliency level
Component
01:0
Component
01:1
Component
01:2
Component
01:3
Chain scanning
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 48
Data Services: Identification of data resiliency level
Component
01:0
Component
01:1
Component
01:2
Component
01:3
Chain scanning
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 49
Data Services: Identification of data resiliency level
Component
01:0
Component
01:1
Component
01:2
Component
01:3
Chain scanning
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 50
Data services: reconstruction
Component
01:0
Component
01:1
Component
01:2
Component
01:3
- Sequential read/write of entire Containers
- Erasure decoding and re-encoding
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 51
Data services: reconstruction
Component
01:0
Component
01:1
Component
01:2
Component
01:3
- Sequential read/write of entire Containers
- Erasure decoding and re-encoding
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 52
Data services: reconstruction
Component
01:0
Component
01:1
Component
01:2
Component
01:3
- Sequential read/write of entire Containers
- Erasure decoding and re-encoding
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 53
Data services: fast data transfer
Component
01:0
Component
01:1
Component
01:2
Component
01:3 Old component 01:3
Location of new node (DHT)
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 54
Data services: fast data transfer
Component
01:0
Component
01:1
Component
01:2
Component
01:3 Old component 01:3
Data transfer
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 55
Data services: fast data transfer
Component
01:0
Component
01:1
Component
01:2
Component
01:3 Old component 01:3
Data transfer
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 56
Data services: fast data transfer
Component
01:0
Component
01:1
Component
01:2
Component
01:3 Old component 01:3
Data transfer
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 57
Data services: fast data transfer
Component
01:0
Component
01:1
Component
01:2
Component
01:3 Old component 01:3
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 58
Data services for deduplication
Component
01:0
Component
01:1
Component
01:2
Component
01:3
hash=011.. Block
Choose complete chain
Completeness: “definitely not a duplicate” Deletion interaction: wasn't the block scheduled for deletion?
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 59
Data services for deduplication
hash=011.. Block
Component
01:0
Component
01:1
Component
01:2
Component
01:3 Query
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 60
Data services for deduplication
hash=011.. Block
Local candidate found
Component
01:0
Component
01:1
Component
01:2
Component
01:3
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 61
Data services for deduplication
hash=011.. Block
Candidate verification
Successful dedup
Component
01:0
Component
01:1
Component
01:2
Component
01:3
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 62
On-demand data deletion
- Distributed garbage collection
- Per-block reference counter stored per-
fragment
- Failure-tolerant
- Block reference counter calculated independently
- n peer Container chains
- Interference with duplicate elimination:
- duplicates resurrection after garbage collection
- space reclamation in background
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 63
Resource management
- Configurable load balancing between:
- backup/restore
- background tasks (reconstruction, transfer, etc.)
- garbage collection
- Shares depend on system state
- Assigns priority of tasks automatically
- e.g. reconstruction before transfer or space
reclamation
- Maximizes resources utilization
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 64
Topics for further discussion
- Features and technical details of HYDRAstor
- Sales of HYDRAstor in Poland
- Cooperation with 9LivesData on other projects
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 65