cray data virtualization service
play

Cray Data Virtualization Service a Method for Heterogeneous File - PowerPoint PPT Presentation

Cray Data Virtualization Service a Method for Heterogeneous File System Connectivity David Wallace and Stephen Sugiyama Cray Data Virtualization Service A little background and history Cray DVS Support in CLE 2.1 What if April


  1. Cray Data Virtualization Service a Method for Heterogeneous File System Connectivity David Wallace and Stephen Sugiyama

  2. Cray Data Virtualization Service � A little background and history � Cray DVS Support in CLE 2.1 � What if… April 14, 2008 Cray Inc. Confidental Slide 2

  3. DVS: Background and history � Concept derived from Cray T3E system call forwarding • Focused on I/O forwarding aspects � Initial work focused on clusters � Some design objectives • Provide a low(er) cost alternative to having HBAs on all nodes in a cluster • Utilize bandwidth and capacity of a cluster’s high speed network • Utilize bandwidth and capacity of a cluster’s high speed network • Provide global access to file systems resident on I/O nodes • Provide high performance, parallel file system access • Target I/O patterns of High Performance Computing applications � Very large block sizes � Sequential access � Low data re-use April 14, 2008 Cray Inc. Confidental Slide 3

  4. Serial DVS: Multiple Clients to a Single Server n1 n2 n3 n4 n5 /dvs /dvs /dvs /dvs /dvs High Speed Network (Ethernet, Quadrics, Myrinet) IO 1 � SAN DVS uses a client-server model /dvs • The DVS server(s) stacks on the local file system(s) managing the disks Direct � Serial DVS EXT3, XFS Attached • All requests are routed through a /dev/sda1 single DVS server • Provides similar functionality as NFS April 14, 2008 Cray Inc. Confidental Slide 4

  5. Single DVS Server to a Single I/O Device Client Open, read, write request passed to VFS and intercepted by DVS client. /dvs DVS forwards request to DVS server IO 1 � SAN On server, request passed to local file system /dvs � Meta data, locking operations are local � Data is read/written to disk � Uses standard Linux buffer management � Local cache Direct � EXT3, XFS I/O readahead/write behind Attached /dev/sda1 April 14, 2008 Cray Inc. Confidental Slide 5

  6. DVS: Multiple I/O node support ClientA ClientB ClientC ClientD DVS Storage Network (virtualized file system) DVS DVS DVS DVS DVS DVS DVS DVS SIO-1 SIO-0 SIO-3 SIO-2 / dvs / dvs / dvs / dvs

  7. DVS: Parallel file system write( ‘ /x/y/z ’ ) ClientA ClientB ClientC ClientD A B C D I/O request DVS DVS DVS DVS DVS DVS DVS DVS Aggregate: -Buffer cache SIO-1 SIO-0 SIO-3 SIO-2 -Readahead - Write behind write( ‘ /x/y/z ’ ) write( ‘ /x/y/z ’ ) write( ‘ /x/y/z ’ ) write( ‘ /x/y/z ’ ) / dvs / dvs / dvs / dvs

  8. SO, WHERE ARE WE TODAY? April 14, 2008 Cray Inc. Confidental Slide 8

  9. Cray XT Scalable Lustre I/O: Direct Attached Cray XT Supercomputer � Compute nodes � Login nodes � Lustre OSS � Lustre MDS � NFS Server � Boot/SDB node 1 GigE Backbone 1 GigE Backbone 1 GigE Backbone 1 GigE Backbone 1 GigE Backbone 1 GigE Backbone 1 GigE Backbone 1 GigE Backbone 10 GigE 10 GigE 10 GigE 10 GigE Backup Backup Backup Backup and and and and Archive Archive Archive Archive Lustre global parallel Filesystem Lustre global parallel Filesystem Lustre global parallel Filesystem Lustre global parallel Filesystem Servers Servers Servers Servers � Each compute node runs a Lustre client � The NFS server would allow to export the Lustre filesystem to other systems in the network April 14, 2008 Cray Inc. Confidental Slide 9

  10. Cray XT accessing Lustre and Remote Filesystems Cray XT Supercomputer � Compute nodes � Login nodes � Lustre OSS � Lustre MDS � NFS Clients � Boot/SDB node 1 GigE Backbone 1 GigE Backbone 1 GigE Backbone 1 GigE Backbone 1 GigE Backbone 1 GigE Backbone 1 GigE Backbone 1 GigE Backbone Remote System NFS server exports the remote filesystem Lustre global parallel Filesystem Lustre global parallel Filesystem Lustre global parallel Filesystem Lustre global parallel Filesystem � Each Compute and Login node runs a Lustre client • Lustre is accessable by every node within the Cray XT system � Each Login node imports the remote filesystem • Filesystem only accessable from Login nodes • For global accessability, login nodes need to copy files into Lustre April 14, 2008 Cray Inc. Confidental Slide 10

  11. Why not use NFS throughout? Cray XT Supercomputer � Compute nodes � Login nodes � Lustre OSS � Lustre MDS � NFS Clients � Boot/SDB node 1 GigE Backbone 1 GigE Backbone 1 GigE Backbone 1 GigE Backbone 1 GigE Backbone 1 GigE Backbone 1 GigE Backbone 1 GigE Backbone Remote System NFS server exports the remote filesystem Lustre global parallel Filesystem Lustre global parallel Filesystem Lustre global parallel Filesystem Lustre global parallel Filesystem � Each Compute and Login node runs a Lustre client • Lustre is accessable by every node within the XT system � Each Login AND Compute node imports the remote filesystem • The remote filesystem is accessable from all nodes • No copies required for global accessability April 14, 2008 Cray Inc. Confidental Slide 11

  12. Issues with NFS � Cray XT systems have thousands of compute nodes • A single NFS server typically cannot manage more than 50 clients • Could cascade NFS servers � There only can be a single primary NFS server � The other servers would run as clients on the incoming side and servers on the outbound side � Cascaded servers would have to run in user space • Complicates system administration � Performance impacts • NFS clients run as daemons, thus introducing OS jitter on the compute nodes • The primary NFS server will be overwhelmed • TCP/IP protocol less efficient than native protocol within the Cray SeaStar network April 14, 2008 Cray Inc. Confidental Slide 12

  13. Immediate Need � Access to NFS mounted file systems (/home on login nodes) • For customers migrating from CVN, equivalent functionality to YOD I/O April 14, 2008 Cray Inc. Confidental Slide 13

  14. DVS: Support for NFS in CLE 2.1 ClientA ClientB ClientC ClientD DVS Storage Network (virtualized file system) DVS DVS DVS DVS DVS DVS DVS DVS SIO-1 SIO-0 SIO-3 SIO-2 NFS

  15. Cray DVS – from the admin point of view Compute Compute Compute Nodes Nodes Nodes CRAY XT4 DVS DVS DVS Client Client Client HSN DVS Server SIO Node NFS mount NFS Server � Admin mounts file systems in fstab per usual mount -t dvs /nfs/user/file4 October 2007 v6 Cray Proprietary Information Slide 15

  16. From Users Point-of-View Open() Read() Application Write() Close() /nfs/home/dbw/file Compute Node File system appears to be local April 14, 2008 Cray Inc. Confidental Slide 16

  17. So what if … � Users have data on one or more file systems (e.g. EXT3, XFS, GPFS, Panasas, NFS) on servers in the data center which they want access from the compute nodes on the Cray XT system WITHOUT having to copy files? Then … Then … � You install Cray DVS ( D ata V irtualization S ervice) server software on each of the existing file servers and you install DVS client software on each of the compute nodes and this allows the admin to “mount –t dvs” the file systems April 14, 2008 Cray Inc. Confidental Slide 17

  18. DVS Concept in a Cray XT Environment Compute Compute Compute CRAY XT4 Nodes Nodes Nodes DVS DVS DVS Client Client Client SeaStar Interconnect DVS DVS DVS DVS DVS DVS DVS DVS Server Server Server Server SIO SIO SIO SIO Node Node Node Node PanFS Client GPFS Client NFS Client EXT3/4 or XFS SAM QFS GPFS NFS D Server Server Server A S April 14, 2008 Cray Inc. Confidental Slide 18

  19. Cray DVS - Summary � No file copying required! � Simplicity of DVS client allows for larger scale than NFS or cluster file systems � DVS can amplify the scale of compute nodes serviced to O(10000) • Can project a file system beyond limit of underlying clustered file system � GPFS on Linux is limited to 512 clients � The seek, latency and transfer time (physical disk I/O) for every I/O node is overlapped (mostly parallel) � Every I/O node does read-ahead and write aggregation (in parallel) � The effective page cache size is the aggregate size of all of the I/O nodes page caches � Allows the interconnects to be utilized more fully: • multiple I/O nodes can drive a single app node interconnect at it's maximum speed • multiple app nodes can drive all of the I/O node interconnects at their maximum speed � Takes advantage of RDMA for those interconnects that support it (Cray SeaStar, Quadrics, Myricom) April 14, 2008 Cray Inc. Confidental Slide 19

  20. Cray DVS – Initial Customer Usage � ORNL • Began field trial in December 2007 • Installed in production on ~7200 XT3 cores • Replacement for Catamount YOD-I/O functionality • Mounting NFS mounted /home file systems on XT compute nodes � CSC – Finland • Working with Cray to test DVS with GPFS • Working with Cray to test DVS with GPFS • Installed on TDS and undergoing development and testing � CSCS • Begin early access testing 2Q2008 April 14, 2008 Cray Inc. Confidental Slide 20

  21. Acknowledgements � David Henseler, Cray Inc � Kitrick Sheets, KBS Software � Jim Harrell, Cray Inc � Chris Johns, Cassatt Corp � The DVS Development Team • Stephen Sugiyama • Stephen Sugiyama • Tim Cullen • Brad Stevens • And others April 14, 2008 Cray Inc. Confidental Slide 21

  22. Thank You April 14, 2008 Cray Inc. Confidental Slide 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend