HYDRAstor: a Scalable Secondary Storage 7th TF-Storage Meeting - - PowerPoint PPT Presentation

hydrastor a scalable secondary storage
SMART_READER_LITE
LIVE PREVIEW

HYDRAstor: a Scalable Secondary Storage 7th TF-Storage Meeting - - PowerPoint PPT Presentation

HYDRAstor: a Scalable Secondary Storage 7th TF-Storage Meeting September 9 th 2010 ukasz Heldt Largest Japanese IT company Polish R&D company $43 Billion in annual revenue 50 engineers and scientists 143,000 staff www.9livesdata.com


slide-1
SLIDE 1

HYDRAstor: a Scalable Secondary Storage

7th TF-Storage Meeting

September 9th 2010 Łukasz Heldt

slide-2
SLIDE 2

Largest Japanese IT company $43 Billion in annual revenue 143,000 staff www.nec.com Polish R&D company 50 engineers and scientists www.9livesdata.com Scalable disk based storage for backup with global deduplication Started in 2003 in NEC Labs by Cezary Dubnicki 2007 Product of the year award by SearchStorage.com 2008 Product innovation award by Network Products Guide 2009/2010 FAST conference publication in San Jose Sold in US and Japan since 2007 Will be sold in Poland in 2011 by 9LivesData in coop. with NEC

R&D of critical backend component Owns & sells

slide-3
SLIDE 3

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 3

Backup storage

  • Tapes are most common, despite:
  • Sensitive environment requirements
  • Unreliable restore
  • Low performance
  • Manual labor or expensive robots
  • Problematic replication
slide-4
SLIDE 4

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4

Backup storage size

  • Usual backup policy
  • 4-12+ full backups
  • 7-30+ incremental
  • Majority of data does

not change

  • Data compression 2:1
  • Secondary storage

size:

  • 5x-20x more than

primary storage

  • Includes many copies
  • f the same data
  • Each data chunk

stored 5-10+ times

slide-5
SLIDE 5

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5

Backup storage size

  • Usual backup policy
  • 4-12+ full backups
  • 7-30+ incremental
  • Majority of data does

not change

  • Data compression 2:1
  • Secondary storage

size:

  • 5x-20x more than

primary storage

  • Includes many copies
  • f the same data
  • Each data chunk

stored 5-10+ times

High potential for the deduplication technology.

slide-6
SLIDE 6

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6

Deduplication

  • Save disk space by eliminating duplicates
  • Sample reduction ratio 10:1 (depends on backup policy)
  • Lowers price of gigabyte

B C A D E A

File A File B

B C A

File A

Sub-file level deduplication

B C D A E

Stored blocks Only unique blocks Only unique blocks are stored are stored

slide-7
SLIDE 7

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 7

Global deduplication

  • Prevent silos of deduped data
  • One system to manage

Global vs. siloed dedup

slide-8
SLIDE 8

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 8

HYDRAstor product

  • Provides
  • global deduplication using DataRedux™
  • performance, storage scalability

and data resiliency using Distributed Resilient Data™

slide-9
SLIDE 9

9

HYDRAstor deployment

  • Interface: CIFS, NFS, Symantec OST
  • Marker filtering for: Tivoli, Netbackup, Networker, CommVault
slide-10
SLIDE 10

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 10

HYDRAstor architecture

  • Accelerator Nodes realize performance
  • Storage Nodes realize capacity

Internal Network Accelerator Nodes Storage Nodes NFS / CIFS / OST

  • ver Ethernet
slide-11
SLIDE 11

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 11

HYDRAstor architecture

  • Accelerator Nodes realize performance
  • Storage Nodes realize capacity

Internal Network Accelerator Nodes Storage Nodes NFS / CIFS / OST

  • ver Ethernet

Non-disruptive grid expansion

slide-12
SLIDE 12

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 12

HYDRAstor scalability

  • MiniHYDRA – single server
  • Storage: 12 TB – 240 TB*
  • Performance: 1.3 TB / hour
  • 2AN 4SN
  • Storage: 48 TB – 960 TB*
  • Performance: 3.6 TB / hour
  • 20AN 40SN (4 racks)
  • Storage: 480 TB – 9600 TB*
  • Performance: 36 TB / hour

* - assuming 20x data reduction through DataRedux™

slide-13
SLIDE 13

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 13

HYDRAstor scalability

  • Slide from Curtis Preston presentation

Curtis Preston is a famous storage analyst owning independent consulting company

slide-14
SLIDE 14

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 14

HYDRAstor other features

  • Fully automatic/non-disruptive mgmt
  • Recovery of lost data resiliency
  • Periodic data scrubbing
  • Machine and disk failure recovery
  • Configurable redundancy level
  • erasure coding – better than RAID6
  • Optimized replication
  • Smart resource management
slide-15
SLIDE 15

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 15

HYDRAstor backend design

Details of the design: http://www.usenix.org/events/fast09/tech/full_papers/dubnicki/dubnicki.pdf

slide-16
SLIDE 16

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 16

Programming Model

  • Repository of blocks
  • Content-addressed
  • Immutable
  • Variable-sized

hash=011..0

slide-17
SLIDE 17

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 17

Programming Model

  • Repository of blocks
  • Content-addressed
  • Immutable
  • Variable-sized
  • Exposed pointers to other

blocks

E hash=011..0 011..0

slide-18
SLIDE 18

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 18

Programming Model

  • Repository of blocks
  • Content-addressed
  • Immutable
  • Variable-sized
  • Exposed pointers to other

blocks

  • Trees of blocks

E E E Root1 E hash=010..1 hash=011..0 011..0

slide-19
SLIDE 19

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 19

Programming Model

  • Repository of blocks
  • Content-addressed
  • Immutable
  • Variable-sized
  • Exposed pointers to other

blocks

  • Trees of blocks
  • DAGs due to deduplication
  • No cycles possible

E E 011..0 E Root1 E E Root2 hash=010..1 hash=110..0 hash=011..0 1 1 . .

slide-20
SLIDE 20

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 20

Programming Model

  • Repository of blocks
  • Content-addressed
  • Immutable
  • Variable-sized
  • Exposed pointers to other

blocks

  • Trees of blocks
  • DAGs due to deduplication
  • No cycles possible
  • Deletion of whole trees

E E 1 1 . . E Root1 E E Root2 hash=010..1 hash=110..0 hash=011..0 011..0

slide-21
SLIDE 21

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 21

Programming Model

  • Repository of blocks
  • Content-addressed
  • Immutable
  • Variable-sized
  • Exposed pointers to other

blocks

  • Trees of blocks
  • DAGs due to deduplication
  • No cycles possible
  • Deletion of whole trees

E E 1 1 . . E Root1 E E Root2 hash=010..1 hash=110..0 hash=011..0 011..0

slide-22
SLIDE 22

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 22

Programming Model

  • Repository of blocks
  • Content-addressed
  • Immutable
  • Variable-sized
  • Exposed pointers to other

blocks

  • Trees of blocks
  • DAGs due to deduplication
  • No cycles possible
  • Deletion of whole trees

E E 1 1 . . E Root1 E E Root2 hash=010..1 hash=110..0 hash=011..0 011..0

slide-23
SLIDE 23

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 23

Programming Model

  • Repository of blocks
  • Content-addressed
  • Immutable
  • Variable-sized
  • Exposed pointers to other

blocks

  • Trees of blocks
  • DAGs due to deduplication
  • No cycles possible
  • Deletion of whole trees

E 1 1 . . E Root2 hash=110..0 hash=011..0 011..0

slide-24
SLIDE 24

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 24

Failure tolerance: erasure coding

Decode Any 3 fragments can be lost

Example: N=8, m=5

Encode

Original block

O r i g i n a l F r a g m e n t s R e d u n d a n t F r a g m e n t s

slide-25
SLIDE 25

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 25

Failure tolerance: erasure coding

Decode Any 3 fragments can be lost

Example: N=8, m=5

Encode

Original block

O r i g i n a l F r a g m e n t s R e d u n d a n t F r a g m e n t s

Mirror 3-copy RAID6 Erasure coding Resiliency 1 2 2 2 3 Overhead 100% 200% 20% 20% 33%

Assuming 12 disks array

slide-26
SLIDE 26

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 26

Scalability with DHT: data placement

  • Block location: DHT with prefix routing

1 01 10 11 empty prefix 00 01

slide-27
SLIDE 27

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 27

Scalability with DHT: data placement

  • Block location: DHT with prefix routing
  • Block mapped to hash prefix

hash=011..0

1 01 10 11 empty prefix 00

Block

01

slide-28
SLIDE 28

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 28

Scalability with DHT: data placement

  • Block location: DHT with prefix routing
  • Block mapped to hash prefix
  • Prefix components
  • Hosted on SNs
  • N components

per prefix

hash=011..0 Block

Node 1 Node 6 Node 1 Node 5 Node 1 Node 4 Node 1 Node 3 Node 1 Node 2 Node 1 Node 1

1

1 3 2 1 2 3 1 2 3 2 3 1

01 10 11 empty prefix 00

N=4

slide-29
SLIDE 29

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 29

Scalability with DHT: data placement

hash=011..0 Block

Node 1 Node 6 Node 1 Node 5 Node 1 Node 4 Node 1 Node 3 Node 1 Node 2 Node 1 Node 1

1

1 3 2 1 2 3 1 2 3 2 3 1

01 10 11 empty prefix 00

N=4

  • Block location: DHT with prefix routing
  • Block mapped to hash prefix
  • Prefix components
  • Hosted on SNs
  • N components

per prefix

  • Store fragments
slide-30
SLIDE 30

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 30

Scalability with DHT: data placement

hash=011..0 Block

Node 1 Node 6 Node 1 Node 5 Node 1 Node 4 Node 1 Node 3 Node 1 Node 2 Node 1 Node 1

1

1 3 2 1 2 3 1 2 3 2 3 1

01 10 11 empty prefix 00

N=4

  • Block location: DHT with prefix routing
  • Block mapped to hash prefix
  • Prefix components
  • Hosted on SNs
  • N components

per prefix

  • Store fragments
  • Distributed

consensus

slide-31
SLIDE 31

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 31

Scalability with DHT: data placement

hash=011..0 Block

Node 1 Node 6 Node 1 Node 5 Node 1 Node 4 Node 1 Node 3 Node 1 Node 2 Node 1 Node 1

1

1 3 2 1 2 3 1 2 3 2 3 1

01 10 11 empty prefix 00

N=4

  • Block location: DHT with prefix routing
  • Block mapped to hash prefix
  • Prefix components
  • Hosted on SNs
  • N components

per prefix

  • Store fragments
  • Distributed

consensus

slide-32
SLIDE 32

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 32

Scalability with DHT: data placement

hash=011..0 Block

Node 1 Node 6 Node 1 Node 5 Node 1 Node 4 Node 1 Node 3 Node 1 Node 2 Node 1 Node 1

1

1 3 2 1 2 3 1 2 3 2 3 1

01 10 11 empty prefix 00

N=4

  • Block location: DHT with prefix routing
  • Block mapped to hash prefix
  • Prefix components
  • Hosted on SNs
  • N components

per prefix

  • Store fragments
  • Distributed

consensus

slide-33
SLIDE 33

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 33

Scalability with DHT: data placement

hash=011..0 Block

Node 1 Node 6 Node 1 Node 5 Node 1 Node 4 Node 1 Node 1 Node 3 Node 2 Node 1 Node 1

1

1 3 2 1 2 3 1 2 3 2 3 1

01 10 11 empty prefix 00

N=4

  • Block location: DHT with prefix routing
  • Block mapped to hash prefix
  • Prefix components
  • Hosted on SNs
  • N components

per prefix

  • Store fragments
  • Distributed

consensus

slide-34
SLIDE 34

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 34

Scalability with DHT: data placement

hash=011..0 Block

Node 1 Node 6 Node 1 Node 5 Node 1 Node 4 Node 1 Node 3 Node 1 Node 2 Node 1 Node 1

1

1 3 2 1 2 3 1 2 3 2 3 1

01 10 11 empty prefix 00

N=4

  • Block location: DHT with prefix routing
  • Block mapped to hash prefix
  • Prefix components
  • Hosted on SNs
  • N components

per prefix

  • Store fragments
  • Distributed

consensus

  • Load balancing
slide-35
SLIDE 35

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 35

Data organization: synchrun chains

A B E C D F G

  • Data stream split to blocks
slide-36
SLIDE 36

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 36

Data organization: synchrun chains

A B E C D F G

Hash 010… Hash 101… Hash 110… Hash 011… Hash 000… Hash 011… Hash 100…

  • Data stream split to blocks
  • Hashes of blocks computed
slide-37
SLIDE 37

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 37

Data organization: synchrun chains

A B E C D F G

Hash 010… Hash 101… Hash 110… Hash 011… Hash 000… Hash 011… Hash 100…

  • Data stream split to blocks
  • Hashes of blocks computed
  • Routing through DHT
slide-38
SLIDE 38

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 38

Data organization: synchrun chains

A B E C D F G

Hash 010… Hash 101… Hash 110… Hash 011… Hash 000… Hash 011… Hash 100…

  • Data stream split to blocks
  • Hashes of blocks computed
  • Routing through DHT

Prefix 01

slide-39
SLIDE 39

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 39

Data organization: synchrun chains

A B E C D F G

Hash 010… Hash 101… Hash 110… Hash 011… Hash 000… Hash 011… Hash 100…

Erasure Coding Compression

  • Data stream split to blocks
  • Hashes of blocks computed
  • Routing through DHT

Prefix 01

slide-40
SLIDE 40

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 40

Data organization: synchrun chains

A B E C D F G

Hash 010… Hash 101… Hash 110… Hash 011… Hash 000… Hash 011… Hash 100…

Prefix 01 Erasure Coding Compression

  • Data stream split to blocks
  • Hashes of blocks computed
  • Routing through DHT

Component Component

1

Component

2

Component

3

  • Erasure-coded fragments

stored by components

slide-41
SLIDE 41

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 41

Data organization: synchrun chains

A B E C D F G

Hash 010… Hash 101… Hash 110… Hash 011… Hash 000… Hash 011… Hash 100…

Erasure Coding Compression

  • Data stream split to blocks
  • Hashes of blocks computed
  • Routing through DHT

A D F A D F A D F A D F

Component Component

1

Component

2

Component

3 Prefix 01

  • Erasure-coded fragments

stored by components

slide-42
SLIDE 42

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 42

Data organization: synchrun chains

A B E C D F G

Hash 010… Hash 101… Hash 110… Hash 011… Hash 000… Hash 011… Hash 100…

Synchrun 1 Synchrun 2 Synchrun 3

Prefix 01 Erasure Coding Compression

Synchrun

  • Data stream split to blocks
  • Hashes of blocks computed
  • Routing through DHT

Component Component

1

Component

2

Component

3

  • Erasure-coded fragments

stored by components

  • Grouped into synchruns

A D F A D F A D F A D F

slide-43
SLIDE 43

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 43

Data organization: synchrun chains

A B E C D F G

Hash 010… Hash 101… Hash 110… Hash 011… Hash 000… Hash 011… Hash 100…

Synchrun 1 Synchrun 2 Synchrun 3

Prefix 01 Erasure Coding Compression

  • Data stream split to blocks
  • Hashes of blocks computed
  • Routing through DHT

Component Component

1

Component

2

Component

3 Container

  • Erasure-coded fragments

stored by components

  • Grouped into synchruns
  • Containers stored on disks
  • Fragment metadata

separately from data Synchrun

A D F A D F A D F A D F

slide-44
SLIDE 44

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 44

Data organization: synchrun chains

A B E C D F G

Hash 010… Hash 101… Hash 110… Hash 011… Hash 000… Hash 011… Hash 100…

Synchrun 1 Synchrun 2 Synchrun 3

Erasure Coding Compression

  • Data stream split to blocks
  • Hashes of blocks computed
  • Routing through DHT

A D F A D F A D F A D F

Component Component

1

Component

2

Component

3 Prefix 01

  • Erasure-coded fragments

stored by components

  • Grouped into synchruns
  • Containers stored on disks
  • Fragment metadata

separately from data

  • Ordered synchrun chains
  • Preserve order & locality
  • Manageable

Container

Synchrun

slide-45
SLIDE 45

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 45 Component

01:0

Component

01:1

Component

01:2

Component

01:3

Data Services: Identification of data resiliency level

Missing fragments

slide-46
SLIDE 46

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 46

Data Services: Identification of data resiliency level

Component

01:0

Component

01:1

Component

01:2

Component

01:3

Chain scanning

slide-47
SLIDE 47

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 47

Data Services: Identification of data resiliency level

Component

01:0

Component

01:1

Component

01:2

Component

01:3

Chain scanning

slide-48
SLIDE 48

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 48

Data Services: Identification of data resiliency level

Component

01:0

Component

01:1

Component

01:2

Component

01:3

Chain scanning

slide-49
SLIDE 49

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 49

Data Services: Identification of data resiliency level

Component

01:0

Component

01:1

Component

01:2

Component

01:3

Chain scanning

slide-50
SLIDE 50

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 50

Data services: reconstruction

Component

01:0

Component

01:1

Component

01:2

Component

01:3

  • Sequential read/write of entire Containers
  • Erasure decoding and re-encoding
slide-51
SLIDE 51

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 51

Data services: reconstruction

Component

01:0

Component

01:1

Component

01:2

Component

01:3

  • Sequential read/write of entire Containers
  • Erasure decoding and re-encoding
slide-52
SLIDE 52

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 52

Data services: reconstruction

Component

01:0

Component

01:1

Component

01:2

Component

01:3

  • Sequential read/write of entire Containers
  • Erasure decoding and re-encoding
slide-53
SLIDE 53

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 53

Data services: fast data transfer

Component

01:0

Component

01:1

Component

01:2

Component

01:3 Old component 01:3

Location of new node (DHT)

slide-54
SLIDE 54

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 54

Data services: fast data transfer

Component

01:0

Component

01:1

Component

01:2

Component

01:3 Old component 01:3

Data transfer

slide-55
SLIDE 55

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 55

Data services: fast data transfer

Component

01:0

Component

01:1

Component

01:2

Component

01:3 Old component 01:3

Data transfer

slide-56
SLIDE 56

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 56

Data services: fast data transfer

Component

01:0

Component

01:1

Component

01:2

Component

01:3 Old component 01:3

Data transfer

slide-57
SLIDE 57

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 57

Data services: fast data transfer

Component

01:0

Component

01:1

Component

01:2

Component

01:3 Old component 01:3

slide-58
SLIDE 58

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 58

Data services for deduplication

Component

01:0

Component

01:1

Component

01:2

Component

01:3

hash=011.. Block

Choose complete chain

Completeness: “definitely not a duplicate” Deletion interaction: wasn't the block scheduled for deletion?

slide-59
SLIDE 59

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 59

Data services for deduplication

hash=011.. Block

Component

01:0

Component

01:1

Component

01:2

Component

01:3 Query

slide-60
SLIDE 60

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 60

Data services for deduplication

hash=011.. Block

Local candidate found

Component

01:0

Component

01:1

Component

01:2

Component

01:3

slide-61
SLIDE 61

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 61

Data services for deduplication

hash=011.. Block

Candidate verification

Successful dedup

Component

01:0

Component

01:1

Component

01:2

Component

01:3

slide-62
SLIDE 62

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 62

On-demand data deletion

  • Distributed garbage collection
  • Per-block reference counter stored per-

fragment

  • Failure-tolerant
  • Block reference counter calculated independently
  • n peer Container chains
  • Interference with duplicate elimination:
  • duplicates resurrection after garbage collection
  • space reclamation in background
slide-63
SLIDE 63

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 63

Resource management

  • Configurable load balancing between:
  • backup/restore
  • background tasks (reconstruction, transfer, etc.)
  • garbage collection
  • Shares depend on system state
  • Assigns priority of tasks automatically
  • e.g. reconstruction before transfer or space

reclamation

  • Maximizes resources utilization
slide-64
SLIDE 64

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 64

Topics for further discussion

  • Features and technical details of HYDRAstor
  • Sales of HYDRAstor in Poland
  • Cooperation with 9LivesData on other projects
slide-65
SLIDE 65

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 65

Questions?

Contact: heldt@9livesdata.com www.9livesdata.com www.hydrastor.com