National Data Storage National Data Storage - g - architecture - - PowerPoint PPT Presentation

national data storage national data storage g
SMART_READER_LITE
LIVE PREVIEW

National Data Storage National Data Storage - g - architecture - - PowerPoint PPT Presentation

National Data Storage National Data Storage - g - architecture and mechanisms architecture and mechanisms Micha Jankowski Maciej Brze niak PSNC A Agenda d Introduction Assumptions Assumptions Architecture Main


slide-1
SLIDE 1

National Data Storage National Data Storage -

  • g

architecture and mechanisms architecture and mechanisms

Michał Jankowski Maciej Brzeźniak PSNC

slide-2
SLIDE 2

A d Agenda

  • Introduction
  • Assumptions

Assumptions

  • Architecture
  • Main components
  • Deployment
  • Deployment
  • Use case
slide-3
SLIDE 3

Th bl The problem

Data storage:

  • needs considerable resources (human, software,

hardware…)

  • is complex and expensive
  • exceeds the abilities of many institutions
  • is not their core business

Outsourcing the process may be the Outsourcing the process may be the

  • nly or at least the most reasonable

solution to that problem! solution to that problem!

slide-4
SLIDE 4

O j t Our project

  • KMD (NDS) – National Data Storage

KMD (NDS) National Data Storage

– 2007-2009 – R&D project that implemented the software R&D project that implemented the software

  • PLATON-U4 – ”Popular backup/archival

service” se ce

– 2009-2012 – Deployment project p y p j – Target: scientific and academic institutions

slide-5
SLIDE 5

Ai Aims

Primary aim: Primary aim:

To support scientific and academic community in protecting and archiving community in protecting and archiving the data

Secondary aims:

–Physical protection of the data y p –Assuring logical consistency of the data Long term data archival –Long-term data archival –Tools supporting backup

slide-6
SLIDE 6

O t ti l t Our potential customers

  • Digital libraries
  • Virtual laboratories

>

Virtual laboratories

  • Academic computer

t d t k

600 orga

h d d f

centres and network

  • perators

anization

hundreds of TB/year

  • Research institutions

U i iti

ns ~500

  • Universities
  • Clinical hospitals

~500 > 50

p

slide-7
SLIDE 7

D i ti I Design assumptions I

Hi h il bilit d li bilit

  • High-availability and reliability

– Geographically distributed storage system with data replication – Additional profit: scalability (performance, storage capacity, number of users) – Challenges: consistency, fault tolerance and high performance

slide-8
SLIDE 8

D i ti II Design assumptions II

F ifi t f t d f ti lit

  • Focus on specific system features and functionality

(pointed by the potential users in a survey)

– Secondary data storage Secondary data storage – Data durability and service availability – Geographical data replication N d t h i h biliti – No data sharing or exchange capabilities – Confidentiality of the data -> dedicated name spaces – Automatic replication according to preferred policy: p g p p y

  • Number of replicas
  • Synchronous/asynchronous mode
  • Allowed physical localizations

Allowed physical localizations

slide-9
SLIDE 9

D i ti III Design assumptions III

R li b t h t t ll bl

  • Realism about what we are actually able

to provide

– stable production-level service – budget and the time limitations g

slide-10
SLIDE 10

O ll hit t Overall architecture

Access Methods Servers (SSH HTTPs WebDAV ) User NDS system logic Virtual file systems for data and meta‐data Access Node Database Node Access Methods Servers (SSH, HTTPs, WebDAV...) Meta‐ data DB NDS system logic Users DB Accounting & li it DB Replication Storage Node file system Replica access methods servers Storage Node DB & limits DB

slide-11
SLIDE 11

M t t l Metacatalog

  • Logical structure of the virtual file system
  • Attributes and other metadata of files
  • Mapping logical files – replicas
  • History of operations
  • History of operations
slide-12
SLIDE 12

L i l ti f Logical separation of namespaces

  • Each customer’s contract is connected

with separate virtual file system p y (namespace)

  • Data sharing is not expected by the users
  • Data sharing is not expected by the users
  • Confidentiality is improved
  • Logically separate namespaces mean

physically separated metacatalogs physically separated metacatalogs

– Improved performance and scalability

slide-13
SLIDE 13

Di t ib ti f t d t Distribution of metadata

Each metacatalog is replicated

  • Each metacatalog is replicated

asynchronously in master-slaves mode (Slony-I)

  • Number of MC replicas refers number of

Number of MC replicas refers number of replicas of user files I f f il f t MC l

  • In case of failure of master MC some slave
  • ne is (manually) selected as master
slide-14
SLIDE 14

S i h t d t li ti Semi-synchronous metadata replication

  • Used in synchronous mode of replication of user

data

  • All operations on metadata are synchronously

logged to a number of distributed logs gg g

  • In case of failure: all operations logged between

the update of the slave MC and the failure of the the update of the slave MC and the failure of the master MC are performed on the ”new” master

  • Solution safe as synchronous database
  • Solution safe as synchronous database

replication, but much lighter

slide-15
SLIDE 15

U d t b Users database

  • Institutions -customers
  • Contracts and profiles

(parameters of services) (parameters of services)

– Required number and localization of replicas Mode of replication (synchronic asynchronic) – Mode of replication (synchronic, asynchronic) – …

Users (certificates)

  • Users (certificates)
slide-16
SLIDE 16

A ti d t b Accounting database

  • Resource usage

– Statistics – Billing

Li it ( t )

  • Limits (quota)
slide-17
SLIDE 17

D t D Data Daemon

  • Emulates virtual file system with logical files and directories on AN
  • Enforces security policies and replication policies
  • Takes into account output of monitoring and prediction modules
  • Produces accounting data
  • The virtual FS can be accessed

in a standard way or via a portal (universal interface!) (universal interface!)

  • Based on FUSE
slide-18
SLIDE 18

Metadata Daemon

  • Emulates virtual file system with

metadata on AN metadata on AN

  • Metadata is placed in special files

l t d i di t i di t located in directories corresponding to logical files and directories

  • The virtual FS can be accessed in a

standard way or via a portal y p

  • Based on FUSE
slide-19
SLIDE 19

S t i t f System interfaces

  • Low level:

Virtual file systems for data and metadata y

  • High level:

Standard protocols: SSH HTTPS Standard protocols: SSH, HTTPS, WebDAV, GridFTP

– Limitation: authorization - keyFS

slide-20
SLIDE 20

D t Data access

  • Typical client

yp software Speciali ed portal

  • Specialized portal
slide-21
SLIDE 21

M it i d di ti Monitoring and prediction

  • Monitoring of all important elements

allows avoiding unaccessible SNs by Data Daemon and quick reaction of the administrators

  • Prediction helps optimal selection of

Prediction helps optimal selection of replica to read or selection of node to write a new replica write a new replica

slide-22
SLIDE 22

D t St Data Storage

  • HSM (Hierarchical Storage Management)
slide-23
SLIDE 23

F th ’ i t f i From the user’s point of view…

slide-24
SLIDE 24

S l bilit Scalability

  • Storage Nodes
  • Storage Nodes

– Access performance – System capacity Data transmission throughput (distributed data traffic) – Data transmission throughput (distributed data traffic)

  • Access Nodes

– Responsiveness to I/O requests Data transmission throughput – Data transmission throughput

  • Metacatalogs

– Potential bottleneck Maximum number of files/directories – Maximum number of files/directories – Perform file system level operations – Operation time depends on the database size – Transactions -> limited parallel access Transactions -> limited parallel access – Sensitive to meta-data-intensive operations (backup/archive applications are rather throughput-intensive)

slide-25
SLIDE 25

S t i t ti ti System instantiation

S t t t l ll f di i i

  • Separate metacatalogs allows for easy division
  • f the system into many instances

P l f d d t d

  • Pools of access nodes and storage nodes may

be assigned to the instances The instances and their elements (metacatalogs

  • The instances and their elements (metacatalogs,

virtual file systems…) may be located on dedicated physical or virtualized servers or dedicated physical or virtualized servers or coexist

  • The configuration depends on user requirements
  • The configuration depends on user requirements

against data and metadata processing efficiency

slide-26
SLIDE 26

S t d l t i f t t f PLATON System deployment infrastructure for PLATON

  • 12 5 PB tape storage in 5 localizations
  • 12,5 PB tape storage in 5 localizations
  • 2 PB disc storage in 10 localizations

70 SAN d10Gbit Eth t

  • 70 servers, SANs and10Gbit Ethernet
slide-27
SLIDE 27

U t f PIONIER t k t ffi Use case – storage of PIONIER network traffic

St t l h d f l l i t

  • Store protocol headers for legal requirements
  • 168TB/year -> 5,5 MB/sec
  • Data collected in >20 geographically distributed

PIONIER nodes

  • 5-year durability (replication required)
  • Frequent data writes, rare metadata operations
  • Fits well to PLATON environment:

– Many data sources -> multiple virtualized ANs y p – Replicas -> multiple SNs – Little metadata processing -> may use shared DBMS

slide-28
SLIDE 28

dC h dCache

Di t ib t d h t i ibl i l i t l fil

  • Distributed, heterogeneous, visible as single virtual file

system

  • Designed for big number of users (scientists) accessing

Designed for big number of users (scientists) accessing the same file system – exchange of data is important

  • Persistence model: defines how the data should be

exchanged with trirtrary storage, replicated, migrated to hot spots, recovered… Access: dedicated protocol NFS FTP HTTP WebDAV

  • Access: dedicated protocol, NFS, FTP, HTTP, WebDAV,

GridFTP

  • Metadata stored in relational DB, metadata services may

Metadata stored in relational DB, metadata services may be organized hierarchically

  • Suitable for grid environment
slide-29
SLIDE 29

iR d iRods

S ft f d t id di it l lib i i t t

  • Software for data-grids, digital libraries, persistent

archives, real-time systems

  • Data managed using set of rules: replication modes

Data managed using set of rules: replication, modes, security assumptions, access time, load balancing, failure recovery… (flexible, but complex)

  • Metacatalog: central, based on relational database
  • Implemented as a set of services

O

  • Object oriented
  • Approved by NASA for commerce, but used mainly by

R&D R&D

slide-30
SLIDE 30

Th k ! Thank you!

Questions?

slide-31
SLIDE 31
slide-32
SLIDE 32

S f t i d t t Some facts on massive data storage

1800 EB l b ll / 260 GB t

  • 1800 EB globally / 260 GB per person to

be produced in 2010*

  • Data produced in computer systems

exceed the storage capacity available g p y

  • Managing, classifying, storing, short-term

and long-term protecting data is complex and long-term protecting data is complex and expensive!

*) source: IDC analysis. http://www.itnews.com.au