NFSv4 Replication for Grid Storage Middleware Peter Honeyman - - PowerPoint PPT Presentation
NFSv4 Replication for Grid Storage Middleware Peter Honeyman - - PowerPoint PPT Presentation
NFSv4 Replication for Grid Storage Middleware Peter Honeyman Center for Information Technology Integration University of Michigan, Ann Arbor Acknowledgements Joint work with Jiaying Zhang Partially supported by NSF Middleware
November 27, 2006 Workshop on Middleware for Grid Computing 1
Acknowledgements
Joint work with Jiaying Zhang Partially supported by
NSF Middleware Initiative grant SCI-0438298 Network Appliance, Inc.
November 27, 2006 Workshop on Middleware for Grid Computing 2
Outline
Motivation Design Evaluation
November 27, 2006 Workshop on Middleware for Grid Computing 3
Motivation
Emerging global scientific collaborations
Access to widely distributed data must be
reliable, efficient, and convenient
Current solution: GridFTP
Shared data sets are synchronized manually
Our solution: NFSv4.r
Replicated file system Excellent performance Conventional file system semantics
November 27, 2006 Workshop on Middleware for Grid Computing 4
Usage Scenario
WAN
File server Scientist Visualization center Personal cluster of high performance computers Massive cluster of high performance computers
November 27, 2006 Workshop on Middleware for Grid Computing 5
Usage Scenario
WAN
Massive cluster of high performance computers Personal cluster of high performance computers File replication server File replication server Scientist Visualization center File replication server
/nfs/user/bob/exp1 /nfs/user/bob/exp1 /nfs/user/bob/exp1
November 27, 2006 Workshop on Middleware for Grid Computing 6
Outline
Motivation Design
Global name space Consistent mutable replication
Evaluation
November 27, 2006 Workshop on Middleware for Grid Computing 7
Global Name Space
/nfs is the global root of all NFS file systems Entries under /nfs are mounted on demand The format of reference names under /nfs
follows DNS conventions
E.g.: /nfs/umich.edu/lib/file1
November 27, 2006 Workshop on Middleware for Grid Computing 8
Extended Use of DNS
DNS SRV resource records carry NFS server
location information
The corresponding name server maps a
logical name to some NFS servers
Client-side utility enables transparent
access to the global name space
November 27, 2006 Workshop on Middleware for Grid Computing 9
Outline
Motivation Current work
Global Name Space Consistent Mutable Replication
Evaluation
November 27, 2006 Workshop on Middleware for Grid Computing 10
Why Replication?
Performance
Access distributed data from nearby or lightly
loaded servers
Failure resilience
Users and applications can switch from a failed
replication server to a working one
November 27, 2006 Workshop on Middleware for Grid Computing 11
Replication in Practice
Read-only replication
E.g., AFS Does not support complex data sharing, e.g.
concurrent writes
Lacks network transparency for writes
Optimistic replication
E.g., Coda Focus is availability, not consistency
November 27, 2006 Workshop on Middleware for Grid Computing 12
Consistent Mutable Replication
Problem: state of the practice in file system
replication does not satisfy the requirements of global scientific collaborations
Solution: consistent mutable replication
Problem: can provide Grid applications efficient
and reliable data access?
November 27, 2006 Workshop on Middleware for Grid Computing 13
Requirements
A server-to-server replication protocol Optimal read-only behavior
Performance identical to unreplicated system
Consistent write behavior
Dynamically elect a primary server to coordinate
concurrent writes
Close-to-open semantics
Application opening a file sees the data written
by the last application that wrote & closed the file
November 27, 2006 Workshop on Middleware for Grid Computing 14
When a client opens a file for writing, other replication servers are instructed to forward writes. The selected server temporarily becomes the primary for that file
wopen client
Replication Control: open
November 27, 2006 Workshop on Middleware for Grid Computing 15
client
Replication Control: write
The primary server asynchronously distributes updates to
- ther servers during file modification
write
November 27, 2006 Workshop on Middleware for Grid Computing 16
close client
Replication Control: close
After the active replication servers are synchronized, the primary server distributes the active view and withdraws as the primary server for the file
November 27, 2006 Workshop on Middleware for Grid Computing 17
Consistency
View based control (E1 Abbadi, Skeen, and
Cristian) guarantees sequential consistency
A server becomes primary server after collecting
acknowledgements from a majority of replication servers
A primary server must ensure that every
active replication server has acknowledged its role when a written file is closed
Guarantees close-to-open semantics
November 27, 2006 Workshop on Middleware for Grid Computing 18
Replication Server Failure
Every server keeps track of the per-file
liveness of other servers (active view)
Primary server removes from the active
view any server that fails to respond
Primary server sends other servers the
active view before releasing its role
Active servers refuse any request that
comes from a server not in the active view
A failed replication server can rejoin the
active group only after it synchronizes
November 27, 2006 Workshop on Middleware for Grid Computing 19
Primary Server Failure
File becomes inaccessible
Modification to El Abbadi et al. to allow
asynchronous update
Ensures durability of data written by a client and
acknowledged by the server
Clients can continue to access objects that are
- utside the control of the failed server
Applications decide whether to wait for the failed
server to recover or to reproduce the computation results
November 27, 2006 Workshop on Middleware for Grid Computing 20
Hierarchical Replication Control
Primary server election is costly over
WAN
Heuristic: hierarchical replication
control
A primary server can assert control at
different granularities
Reduces costly elections when there is
locality of reference
November 27, 2006 Workshop on Middleware for Grid Computing 21
Shallow Control
/usr bin local
A server with shallow control on a file or
directory is the primary server for that single object
November 27, 2006 Workshop on Middleware for Grid Computing 22
Deep Control
/usr bin local
A server with deep control on a directory is
the primary server for everything in the subtree rooted at that directory
November 27, 2006 Workshop on Middleware for Grid Computing 23
Outline
Motivation Current work Evaluation
November 27, 2006 Workshop on Middleware for Grid Computing 24
NAS Grid Benchmarks
An evaluation tool released by NASA for
Grid computing
An instance of NGB
class (mesh size, number of iterations) source(s) of input data consumer(s) of solution values
November 27, 2006 Workshop on Middleware for Grid Computing 25
Four NGB Problems
FT BT MG Launch Report FT BT MG FT BT MG LU BT SP Launch Report BT LU SP LU BT SP Embarrassingly Distributed (ED) Helical Chain (HC) Visualization Pipe (VP) Mixed Bag (MB) Launch Report SP SP SP SP SP SP SP SP SP LU LU LU Launch Report MG MG MG FT FT FT
November 27, 2006 Workshop on Middleware for Grid Computing 26
Experiment Setup
November 27, 2006 Workshop on Middleware for Grid Computing 27
Helical Chain (Small)
November 27, 2006 Workshop on Middleware for Grid Computing 28
Helical Chain Medium
November 27, 2006 Workshop on Middleware for Grid Computing 29
Helical Chain Large
November 27, 2006 Workshop on Middleware for Grid Computing 30
Helical Chain Huge
November 27, 2006 Workshop on Middleware for Grid Computing 31
Visualization Pipe Small
November 27, 2006 Workshop on Middleware for Grid Computing 32
Visualization Pipe Medium
November 27, 2006 Workshop on Middleware for Grid Computing 33
Visualization Pipe Large
November 27, 2006 Workshop on Middleware for Grid Computing 34
Visualization Pipe Huge
November 27, 2006 Workshop on Middleware for Grid Computing 35
Conclusion
Conventional wisdom
Consistent mutable replication in large-scale
distributed storage systems is too expensive to consider
Our experiments prove otherwise
Consistent mutable replication in large-scale
distributed storage systems is feasible and practical
Superior performance Rigorous adherence to ordinary semantics
November 27, 2006 Workshop on Middleware for Grid Computing 36