NFSv4 Replication for Grid Storage Middleware Peter Honeyman - PowerPoint PPT Presentation

NFSv4 Replication for Grid Storage Middleware Peter Honeyman Center for Information Technology Integration University of Michigan, Ann Arbor

Acknowledgements  Joint work with Jiaying Zhang  Partially supported by  NSF Middleware Initiative grant SCI-0438298  Network Appliance, Inc. November 27, 2006 Workshop on Middleware for Grid Computing 1

Outline  Motivation  Design  Evaluation November 27, 2006 Workshop on Middleware for Grid Computing 2

Motivation  Emerging global scientific collaborations  Access to widely distributed data must be reliable, efficient, and convenient  Current solution: GridFTP  Shared data sets are synchronized manually  Our solution: NFSv4.r  Replicated file system  Excellent performance  Conventional file system semantics November 27, 2006 Workshop on Middleware for Grid Computing 3

Usage Scenario Massive cluster of high performance computers Visualization center WAN File server Scientist Personal cluster of high performance computers November 27, 2006 Workshop on Middleware for Grid Computing 4

Usage Scenario Massive cluster of high performance computers Visualization center WAN File replication File replication server server /nfs/user/bob/exp1 /nfs/user/bob/exp1 File replication server Scientist /nfs/user/bob/exp1 Personal cluster of high performance computers November 27, 2006 Workshop on Middleware for Grid Computing 5

Outline  Motivation  Design  Global name space  Consistent mutable replication  Evaluation November 27, 2006 Workshop on Middleware for Grid Computing 6

Global Name Space  /nfs is the global root of all NFS file systems  Entries under /nfs are mounted on demand  The format of reference names under /nfs follows DNS conventions  E.g.: /nfs/umich.edu/lib/file1 November 27, 2006 Workshop on Middleware for Grid Computing 7

Extended Use of DNS  DNS SRV resource records carry NFS server location information  The corresponding name server maps a logical name to some NFS servers  Client-side utility enables transparent access to the global name space November 27, 2006 Workshop on Middleware for Grid Computing 8

Outline  Motivation  Current work  Global Name Space  Consistent Mutable Replication  Evaluation November 27, 2006 Workshop on Middleware for Grid Computing 9

Why Replication?  Performance  Access distributed data from nearby or lightly loaded servers  Failure resilience  Users and applications can switch from a failed replication server to a working one November 27, 2006 Workshop on Middleware for Grid Computing 10

Replication in Practice  Read-only replication  E.g., AFS  Does not support complex data sharing, e.g. concurrent writes  Lacks network transparency for writes  Optimistic replication  E.g., Coda  Focus is availability, not consistency November 27, 2006 Workshop on Middleware for Grid Computing 11

Consistent Mutable Replication  Problem: state of the practice in file system replication does not satisfy the requirements of global scientific collaborations  Solution: consistent mutable replication  Problem: can provide Grid applications efficient and reliable data access? November 27, 2006 Workshop on Middleware for Grid Computing 12

Requirements  A server-to-server replication protocol  Optimal read-only behavior  Performance identical to unreplicated system  Consistent write behavior  Dynamically elect a primary server to coordinate concurrent writes  Close-to-open semantics  Application opening a file sees the data written by the last application that wrote & closed the file November 27, 2006 Workshop on Middleware for Grid Computing 13

Replication Control: open When a client opens a file for writing, other replication servers are instructed to forward writes. The selected server temporarily becomes the primary for that file wopen client November 27, 2006 Workshop on Middleware for Grid Computing 14

Replication Control: write The primary server asynchronously distributes updates to other servers during file modification write client November 27, 2006 Workshop on Middleware for Grid Computing 15

Replication Control: close After the active replication servers are synchronized, the primary server distributes the active view and withdraws as the primary server for the file close client November 27, 2006 Workshop on Middleware for Grid Computing 16

Consistency  View based control (E1 Abbadi, Skeen, and Cristian) guarantees sequential consistency  A server becomes primary server after collecting acknowledgements from a majority of replication servers  A primary server must ensure that every active replication server has acknowledged its role when a written file is closed  Guarantees close-to-open semantics November 27, 2006 Workshop on Middleware for Grid Computing 17

Replication Server Failure  Every server keeps track of the per-file liveness of other servers (active view)  Primary server removes from the active view any server that fails to respond  Primary server sends other servers the active view before releasing its role  Active servers refuse any request that comes from a server not in the active view  A failed replication server can rejoin the active group only after it synchronizes November 27, 2006 Workshop on Middleware for Grid Computing 18

Primary Server Failure  File becomes inaccessible  Modification to El Abbadi et al. to allow asynchronous update  Ensures durability of data written by a client and acknowledged by the server  Clients can continue to access objects that are outside the control of the failed server  Applications decide whether to wait for the failed server to recover or to reproduce the computation results November 27, 2006 Workshop on Middleware for Grid Computing 19

Hierarchical Replication Control  Primary server election is costly over WAN  Heuristic: hierarchical replication control  A primary server can assert control at different granularities  Reduces costly elections when there is locality of reference November 27, 2006 Workshop on Middleware for Grid Computing 20

Shallow Control  A server with shallow control on a file or directory is the primary server for that single object /usr bin local November 27, 2006 Workshop on Middleware for Grid Computing 21

Deep Control  A server with deep control on a directory is the primary server for everything in the subtree rooted at that directory /usr bin local November 27, 2006 Workshop on Middleware for Grid Computing 22

Outline  Motivation  Current work  Evaluation November 27, 2006 Workshop on Middleware for Grid Computing 23

NAS Grid Benchmarks  An evaluation tool released by NASA for Grid computing  An instance of NGB  class (mesh size, number of iterations)  source(s) of input data  consumer(s) of solution values November 27, 2006 Workshop on Middleware for Grid Computing 24

Four NGB Problems Embarrassingly Distributed (ED) Launch SP SP SP SP SP SP SP SP SP Report Helical Chain (HC) Visualization Pipe (VP) Mixed Bag (MB) Launch Launch Launch BT SP LU BT MG FT LU LU LU LU SP BT BT MG FT MG MG MG BT SP LU BT MG FT FT FT FT Report Report Report November 27, 2006 Workshop on Middleware for Grid Computing 25

Experiment Setup November 27, 2006 Workshop on Middleware for Grid Computing 26

Helical Chain (Small) November 27, 2006 Workshop on Middleware for Grid Computing 27

Helical Chain Medium November 27, 2006 Workshop on Middleware for Grid Computing 28

Helical Chain Large November 27, 2006 Workshop on Middleware for Grid Computing 29

Helical Chain Huge November 27, 2006 Workshop on Middleware for Grid Computing 30

Visualization Pipe Small November 27, 2006 Workshop on Middleware for Grid Computing 31

Visualization Pipe Medium November 27, 2006 Workshop on Middleware for Grid Computing 32

Visualization Pipe Large November 27, 2006 Workshop on Middleware for Grid Computing 33

Visualization Pipe Huge November 27, 2006 Workshop on Middleware for Grid Computing 34

Conclusion  Conventional wisdom  Consistent mutable replication in large-scale distributed storage systems is too expensive to consider  Our experiments prove otherwise  Consistent mutable replication in large-scale distributed storage systems is feasible and practical  Superior performance  Rigorous adherence to ordinary semantics November 27, 2006 Workshop on Middleware for Grid Computing 35

Thank you for your attention! Questions?! www.citi.umich.edu November 27, 2006 Workshop on Middleware for Grid Computing 36

NFSv4 Replication for Grid Storage Middleware Peter Honeyman - PowerPoint PPT Presentation

NFSv4 Replication for Grid Storage Middleware Peter Honeyman Center for Information Technology Integration University of Michigan, Ann Arbor Acknowledgements Joint work with Jiaying Zhang Partially supported by NSF Middleware

dCache NFSv4.1 Tigran Mkrtchyan Zeuthen, 13.04.12 dCache NFSv4.1 | Tigran Mkrtchyan | 4/13/12 |

NFSv4 Strawman Spencer Shepler spencer.shepler@eng.sun.com draft-shepler-nfsv4-02.txt Spencer

NFSv4.1/pNFS Ready for Prime Time Deployment February 15, 2012 FAST 2012 San Jose NFSv4.1

IETF-64 2005-11-08 Mike Eisler email2mre-ietf@yahoo.com draft-eisler-nfsv4-impid-00.txt

NFSv4 Requirements Spencer Shepler spencer.shepler@eng.sun.com Spencer Shepler 42nd IETF NFSv4

Middleware Chapter 2: Contents - Chapter 2 Understanding middleware Middleware as a

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

GANESHA, a multi-usage with large cache NFSv4 server Philippe Deniel Thomas Leibovici

NFSv4 Beyond v4.2 Part 2 of Road Map of the features in NFS v4.1, v4.2, and beyond Dave Noveck

NFSv4 ACLs and mode bits Traditionally clients have used either mode bits or Windows ACLs, rarely

Entity Resolution: Glue for Middleware Hector Garcia-Molina Stanford University Middleware

MySQL Replication Tutorial Mats Kindahl Senior Software Engineer Replication Technology Lars

August 23, 2012 Data Replication/ETL: Terms Data Replication : Data Replication is the process of

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Sun and Grid John Barr Grid Business Development 07808 328351 john.barr@sun.com Sun and Grid

Our cloud is thirsty ! Shaolei Ren Florida International University sren@cs.fiu.edu 1 A

The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora {mbalassi, gyfora}@apache.org

Adrian Tate Adrian Tate Technical Lead of Scientific Libraries Technical Lead of Scientific

Systems for Data Science Marco Serafini COMPSCI 532 Lecture 1 Course Structure

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors Austin T.

CompSci 514: Computer Networks L18: Datacenter Network Architectures II Xiaowei Yang 1

Topic 2 Current, Voltage and Power Prof Peter Cheung Dyson School of Design Engineering

Interprocess Communication Chester Rebeiro IIT Madras 1 Virtual Memory View During

Sambuz

Useful Links

Newsletter

Mail Us