#rozofs Dimitri Pertin @denaitre 1 / 29 RozoFS: The Scalable - - PowerPoint PPT Presentation

rozofs
SMART_READER_LITE
LIVE PREVIEW

#rozofs Dimitri Pertin @denaitre 1 / 29 RozoFS: The Scalable - - PowerPoint PPT Presentation

#rozofs Dimitri Pertin @denaitre 1 / 29 RozoFS: The Scalable Distributed File System based on Erasure Coding available on https://github.com/rozofs/rozofs 2 / 29 Distributed Storage Systems 3 / 29 Distributed Storage Systems Goal: Improve


slide-1
SLIDE 1

#rozofs

Dimitri Pertin @denaitre

1 / 29

slide-2
SLIDE 2

RozoFS: The Scalable Distributed File System based on Erasure Coding

available on https://github.com/rozofs/rozofs 2 / 29

slide-3
SLIDE 3

Distributed Storage Systems

3 / 29

slide-4
SLIDE 4

A E I J F B C G K L H D A B C C B A D E F F E D A0 A1 P2 Q2 P1 B0 P0 Q1 C2 D2 D1 Q0

Distributed Storage Systems

Goal: Improve storage protection and/or performance

RAID controllers for local data distribution over disks RAID-0 improve performance, no protection; RAID-1 improve protection, bad performance; RAID-6 trade-off between protection and performance. 4 / 29

slide-5
SLIDE 5

Distributed Storage Systems

Distributed storage systems for network data distribution

New client node joins the storage network: 5 / 29

slide-6
SLIDE 6

RozoFS File System

A Unique Namespace relying on several storage nodes

A POSIX Distributed File System can be simultaneously mounted by multiple clients and provides: Scalability; Flexibility and heterogeneity; Access/Location transparency; Data protection by an erasure code. 6 / 29

slide-7
SLIDE 7

Fault Tolerance

7 / 29

slide-8
SLIDE 8

Fault Tolerance

Distributed storage systems for network data distribution

Write redundant information over nodes: 8 / 29

slide-9
SLIDE 9

Fault Tolerance

Distributed storage systems for network data distribution

Read a subset is sufficient: 9 / 29

slide-10
SLIDE 10

Fault Tolerance

Distributed storage systems for network data distribution

Face node/link/matrix failures: 10 / 29

slide-11
SLIDE 11

Fault Tolerance

Data Replication (3 copies) Remarks:

Does not need any computation; But is very expensive; Three copies cost 3 times the original amount of information. 11 / 29

slide-12
SLIDE 12

Fault Tolerance

Data Replication (3 copies)

12 / 29

slide-13
SLIDE 13

Problem ?

13 / 29

slide-14
SLIDE 14

Distributed Storage Systems

What is the problem ?

The Digital Universe in 2020, J. Grantz and D. Reinsel (2012). 14 / 29

slide-15
SLIDE 15

Distributed Storage Systems

What is the problem ?

Data protection plays a major role in storage consumption: The amount of information indivuals create themselves - writing documents, taking pictures, downloading music, etc. - is far less than the amount of information being created about them in the digital universe. The proportion of data in the digital universe that requires protection is growing faster than the digital itself, from less than a third in 2010 to more than 40% in 2020. The Digital Universe in 2020, J. Grantz and D. Reinsel (2012). 15 / 29

slide-16
SLIDE 16

Erasure Coding

16 / 29

slide-17
SLIDE 17

Data Protection by Erasure Coding

(6,4) Erasure Encoding

Data Flow k Data Blocks n Parity Blocks

Remarks

Optimal (MDS) codes decode from any subset of parity blocks out of ; The system can face failures; The storage overhead is

k n n − k = 2 = 1.5

n k

17 / 29

slide-18
SLIDE 18

Data Protection by Erasure Coding

(6,4) Erasure Decoding

Data Flow k Data Blocks k Parity Blocks

Remarks

Optimal (MDS) codes decode from any subset of parity blocks out of ; The system can face failures; The storage overhead is

k n n − k = 2 = 1.5

n k

18 / 29

slide-19
SLIDE 19

Data Protection by Erasure Coding

Comparison ?

Data Replication by 3 (6,4) Erasure Code

19 / 29

slide-20
SLIDE 20

The Mojette Transform

20 / 29

slide-21
SLIDE 21

The Mojette Transform

Presentation

The Mojette Transform is a linear operation based on discrete geometry; Computes redundant information from user's data; The algorithm relies only on additions.

Performances

Implementation uses fast XOR; Encoding and decoding computations are transparent. The Mojette Transform, Theory and Applications, J. Guédon (2009). 21 / 29

slide-22
SLIDE 22

The Mojette Transform

Protection in Storage Systems

File (48kB) Chunks (4kB) k=4 Data Blocks (1kB) n=6 Parity Blocks (1kB)

The MT is applied on data blocks to produce a set of parity blocks; Parity blocks are distributed over storage nodes; Any subset of parity blocks out of the is sufficient to decode.

4 6 k = 4 n = 6

22 / 29

slide-23
SLIDE 23

Architecture of RozoFS

23 / 29

slide-24
SLIDE 24

Architecture of RozoFS

Metadata Server: exportd service

Stores metadata (data about user data) POSIX information (e.g. size, permissions, timestamps, etc.) RozoFS related information (e.g. data localisation) Knows the position of data blocks answers data location in reading answers where to store projections in writing 24 / 29

slide-25
SLIDE 25

Architecture of RozoFS

Storage Servers: storaged daemon

Hold a storaged daemon that manages data storing data retrieval data accessibility Data can be stored on: local file system (ext4, xfs, etc.) or remote Amazon bucket native or other protocol (CIFS, AFP, etc.) 25 / 29

slide-26
SLIDE 26

Architecture of RozoFS

Clients

Rely on FUSE (rozofsmount) mounts locally RozoFS translates transparently user actions for the network system Manage encoding (write) and decoding (read) 26 / 29

slide-27
SLIDE 27

Production Use Example

exportd (A) rozofsmount storaged AFP exportd (P) rozofsmount storaged AFP rozofsmount storaged AFP rozofsmount storaged AFP rozofsmount storaged AFP rozofsmount storaged AFP

Gigabit Ethernet Gigabit Ethernet FCP1 FCP2 FCP3 FCP4 FCP5 RozoFS Global Namespace 27 / 29

slide-28
SLIDE 28

Academic Use Example

28 / 29

slide-29
SLIDE 29

Thanks!

Contribute: https://github.com/rozofs/rozofs Contact me at: @denaitre or dimitri.pertin@univ-nantes.fr Have a look at ANR FEC4Cloud project

Slideshow created by remark. 29 / 29