NFSv4 Replication for Grid Storage Middleware Peter Honeyman - - PowerPoint PPT Presentation

nfsv4 replication for grid storage middleware
SMART_READER_LITE
LIVE PREVIEW

NFSv4 Replication for Grid Storage Middleware Peter Honeyman - - PowerPoint PPT Presentation

NFSv4 Replication for Grid Storage Middleware Peter Honeyman Center for Information Technology Integration University of Michigan, Ann Arbor Acknowledgements Joint work with Jiaying Zhang Partially supported by NSF Middleware


slide-1
SLIDE 1

NFSv4 Replication for Grid Storage Middleware

Peter Honeyman

Center for Information Technology Integration University of Michigan, Ann Arbor

slide-2
SLIDE 2

November 27, 2006 Workshop on Middleware for Grid Computing 1

Acknowledgements

 Joint work with Jiaying Zhang  Partially supported by

 NSF Middleware Initiative grant SCI-0438298  Network Appliance, Inc.

slide-3
SLIDE 3

November 27, 2006 Workshop on Middleware for Grid Computing 2

Outline

 Motivation  Design  Evaluation

slide-4
SLIDE 4

November 27, 2006 Workshop on Middleware for Grid Computing 3

Motivation

 Emerging global scientific collaborations

 Access to widely distributed data must be

reliable, efficient, and convenient

 Current solution: GridFTP

 Shared data sets are synchronized manually

 Our solution: NFSv4.r

 Replicated file system  Excellent performance  Conventional file system semantics

slide-5
SLIDE 5

November 27, 2006 Workshop on Middleware for Grid Computing 4

Usage Scenario

WAN

File server Scientist Visualization center Personal cluster of high performance computers Massive cluster of high performance computers

slide-6
SLIDE 6

November 27, 2006 Workshop on Middleware for Grid Computing 5

Usage Scenario

WAN

Massive cluster of high performance computers Personal cluster of high performance computers File replication server File replication server Scientist Visualization center File replication server

/nfs/user/bob/exp1 /nfs/user/bob/exp1 /nfs/user/bob/exp1

slide-7
SLIDE 7

November 27, 2006 Workshop on Middleware for Grid Computing 6

Outline

 Motivation  Design

 Global name space  Consistent mutable replication

 Evaluation

slide-8
SLIDE 8

November 27, 2006 Workshop on Middleware for Grid Computing 7

Global Name Space

 /nfs is the global root of all NFS file systems  Entries under /nfs are mounted on demand  The format of reference names under /nfs

follows DNS conventions

 E.g.: /nfs/umich.edu/lib/file1

slide-9
SLIDE 9

November 27, 2006 Workshop on Middleware for Grid Computing 8

Extended Use of DNS

 DNS SRV resource records carry NFS server

location information

 The corresponding name server maps a

logical name to some NFS servers

 Client-side utility enables transparent

access to the global name space

slide-10
SLIDE 10

November 27, 2006 Workshop on Middleware for Grid Computing 9

Outline

 Motivation  Current work

 Global Name Space  Consistent Mutable Replication

 Evaluation

slide-11
SLIDE 11

November 27, 2006 Workshop on Middleware for Grid Computing 10

Why Replication?

 Performance

 Access distributed data from nearby or lightly

loaded servers

 Failure resilience

 Users and applications can switch from a failed

replication server to a working one

slide-12
SLIDE 12

November 27, 2006 Workshop on Middleware for Grid Computing 11

Replication in Practice

 Read-only replication

 E.g., AFS  Does not support complex data sharing, e.g.

concurrent writes

 Lacks network transparency for writes

 Optimistic replication

 E.g., Coda  Focus is availability, not consistency

slide-13
SLIDE 13

November 27, 2006 Workshop on Middleware for Grid Computing 12

Consistent Mutable Replication

 Problem: state of the practice in file system

replication does not satisfy the requirements of global scientific collaborations

 Solution: consistent mutable replication

 Problem: can provide Grid applications efficient

and reliable data access?

slide-14
SLIDE 14

November 27, 2006 Workshop on Middleware for Grid Computing 13

Requirements

 A server-to-server replication protocol  Optimal read-only behavior

 Performance identical to unreplicated system

 Consistent write behavior

 Dynamically elect a primary server to coordinate

concurrent writes

 Close-to-open semantics

 Application opening a file sees the data written

by the last application that wrote & closed the file

slide-15
SLIDE 15

November 27, 2006 Workshop on Middleware for Grid Computing 14

When a client opens a file for writing, other replication servers are instructed to forward writes. The selected server temporarily becomes the primary for that file

wopen client

Replication Control: open

slide-16
SLIDE 16

November 27, 2006 Workshop on Middleware for Grid Computing 15

client

Replication Control: write

The primary server asynchronously distributes updates to

  • ther servers during file modification

write

slide-17
SLIDE 17

November 27, 2006 Workshop on Middleware for Grid Computing 16

close client

Replication Control: close

After the active replication servers are synchronized, the primary server distributes the active view and withdraws as  the primary server for the file

slide-18
SLIDE 18

November 27, 2006 Workshop on Middleware for Grid Computing 17

Consistency

 View based control (E1 Abbadi, Skeen, and

Cristian) guarantees sequential consistency

 A server becomes primary server after collecting

acknowledgements from a majority of replication servers

 A primary server must ensure that every

active replication server has acknowledged its role when a written file is closed

 Guarantees close-to-open semantics

slide-19
SLIDE 19

November 27, 2006 Workshop on Middleware for Grid Computing 18

Replication Server Failure

 Every server keeps track of the per-file

liveness of other servers (active view)

 Primary server removes from the active

view any server that fails to respond

 Primary server sends other servers the

active view before releasing its role

 Active servers refuse any request that

comes from a server not in the active view

 A failed replication server can rejoin the

active group only after it synchronizes

slide-20
SLIDE 20

November 27, 2006 Workshop on Middleware for Grid Computing 19

Primary Server Failure

 File becomes inaccessible

 Modification to El Abbadi et al. to allow

asynchronous update

 Ensures durability of data written by a client and

acknowledged by the server

 Clients can continue to access objects that are

  • utside the control of the failed server

 Applications decide whether to wait for the failed

server to recover or to reproduce the computation results

slide-21
SLIDE 21

November 27, 2006 Workshop on Middleware for Grid Computing 20

Hierarchical Replication Control

 Primary server election is costly over

WAN

 Heuristic: hierarchical replication

control

 A primary server can assert control at

different granularities

 Reduces costly elections when there is

locality of reference

slide-22
SLIDE 22

November 27, 2006 Workshop on Middleware for Grid Computing 21

Shallow Control

/usr bin local

 A server with shallow control on a file or

directory is the primary server for that single object

slide-23
SLIDE 23

November 27, 2006 Workshop on Middleware for Grid Computing 22

Deep Control

/usr bin local

 A server with deep control on a directory is

the primary server for everything in the subtree rooted at that directory

slide-24
SLIDE 24

November 27, 2006 Workshop on Middleware for Grid Computing 23

Outline

 Motivation  Current work  Evaluation

slide-25
SLIDE 25

November 27, 2006 Workshop on Middleware for Grid Computing 24

NAS Grid Benchmarks

 An evaluation tool released by NASA for

Grid computing

 An instance of NGB

 class (mesh size, number of iterations)  source(s) of input data  consumer(s) of solution values

slide-26
SLIDE 26

November 27, 2006 Workshop on Middleware for Grid Computing 25

Four NGB Problems

FT BT MG Launch Report FT BT MG FT BT MG LU BT SP Launch Report BT LU SP LU BT SP Embarrassingly Distributed (ED) Helical Chain (HC) Visualization Pipe (VP) Mixed Bag (MB) Launch Report SP SP SP SP SP SP SP SP SP LU LU LU Launch Report MG MG MG FT FT FT

slide-27
SLIDE 27

November 27, 2006 Workshop on Middleware for Grid Computing 26

Experiment Setup

slide-28
SLIDE 28

November 27, 2006 Workshop on Middleware for Grid Computing 27

Helical Chain (Small)

slide-29
SLIDE 29

November 27, 2006 Workshop on Middleware for Grid Computing 28

Helical Chain Medium

slide-30
SLIDE 30

November 27, 2006 Workshop on Middleware for Grid Computing 29

Helical Chain Large

slide-31
SLIDE 31

November 27, 2006 Workshop on Middleware for Grid Computing 30

Helical Chain Huge

slide-32
SLIDE 32

November 27, 2006 Workshop on Middleware for Grid Computing 31

Visualization Pipe Small

slide-33
SLIDE 33

November 27, 2006 Workshop on Middleware for Grid Computing 32

Visualization Pipe Medium

slide-34
SLIDE 34

November 27, 2006 Workshop on Middleware for Grid Computing 33

Visualization Pipe Large

slide-35
SLIDE 35

November 27, 2006 Workshop on Middleware for Grid Computing 34

Visualization Pipe Huge

slide-36
SLIDE 36

November 27, 2006 Workshop on Middleware for Grid Computing 35

Conclusion

 Conventional wisdom

 Consistent mutable replication in large-scale

distributed storage systems is too expensive to consider

 Our experiments prove otherwise

 Consistent mutable replication in large-scale

distributed storage systems is feasible and practical

 Superior performance  Rigorous adherence to ordinary semantics

slide-37
SLIDE 37

November 27, 2006 Workshop on Middleware for Grid Computing 36

Thank you for your attention! Questions?!

www.citi.umich.edu