GPFS on a Cray XT Shane Canon Data Systems Group Leader Lawrence - PowerPoint PPT Presentation

GPFS on a Cray XT Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 – Atlanta, GA May 4, 2009

Outline • NERSC Global File System • GPFS Overview • Comparison of Lustre and GPFS • Mounting GPFS on a Cray XT • DVS • Future Plans

NERSC Global File System • NERSC Global File System (NGF) provides a common global file system for the NERSC systems. • In Production since 2005 • Currently mounted on all major systems – IBM SP, Cray XT4, SGI Altix, and commodity clusters • Currently provides Project space • Targeted for files that need to be shared across a project and/or used on multiple systems.

NGF and GPFS • NERSC signed a contract with IBM in July 2008 for GPFS • Contract extends through 2014. • Covers all major NERSC systems through NERSC6 including “non- Leadership” systems such as Bassi and Jacquard. • Option for NERSC7

NGF Topology Franklin NGF Nodes Ethernet Network NGF- Login FRANKLIN SAN NGF DVS SAN FRANKLIN NGF Disk SAN Compute Node Lustre Bassi Planck pNSD pNSD BASSI Jacq PDSF PDSF Planck Franklin Disk

GPFS Overview • Share disk model • Distributed lock manager • Supports SAN mode and Network Shared Disk modes (mixed) • Primarily TCP/IP but supports RDMA and Federation for low overhead, high bandwidth • Feature rich and very stable • Largest deployment: LLNL Purple 120 GB/s, ~1,500 clients

Comparisons – Design and Capability GPFS Lustre Design Storage Model Shared Disk Object Locking Distributed Central (OST) Transport TCP (w/ RDMA) LNET (Routable Multi-Network) Scaling (Demonstrated) Clients 1,500 25,000 Bandwidth 120 GB/s 200 GB/s

GPFS Architecture TCP TCP Client NSD Server RDMA NSD IB Client Server SAN SAN Client

Lustre Architecture TCP Net1 Client MDS/OSS RDMA Router Net2 Client

Comparisons – Features GPFS Lustre Add Storage ✔ ✔ Remove Storage ✔  Rebalance ✔  Pools ✔ 1.8 (May) Fileset ✔ Quotas ✔ ✔ Disributed Metadata ✔ 3.0 (2010/11) Snapshots ✔ Failover ✔   -With user/third-party assistance

GPFS on Franklin Interactive Nodes • Franklin has 10 Login Nodes and 6 PBS launch nodes • Currently uses native GPFS client and TCP based mounts on login nodes • Hardware is in place to switch to SAN based mount on Login nodes in near future

GPFS on Cray XT • Mostly “just worked” • Install in shared root environment • Some modifications needed to point to the correct source tree • Slight modifications to mmremote and mmsdrfsdef utility scripts (to use ip command to determine SS IP address)

Franklin Compute Nodes • NERSC will use Cray’s DVS to mount NGF file systems on Franklin compute nodes. • DVS ships IO request to server nodes which have the actual target file system mounted. • DVS has been tested with GPFS at NERSC at scale on Franklin during dedicated test shots • Targeted for production in June time-frame • Franklin has 20 DVS servers connected via SAN.

IO Forwarders IO Forwarder/Function Shipping – Moves IO requests to a proxy server running file system client Advantages Disadvantages • Less overhead on clients • Additional latency (for stack) • Reduced scale from FS • Additional SW component viewpoint (complexity) • Potential for data redistribution (realign and aggregate IO request)

Overview of DVS • Portals based (low overhead) • Kernel modules (both client and server) • Support for striping across multiple servers (Future Release-tested at NERSC) • Tune-ables to adjust behavior – “Stripe” width (number of servers) – Block size – Both mount options and Env. variables

NGF Topology (again) Franklin NGF Nodes Ethernet Network NGF- Login FRANKLIN SAN NGF DVS SAN FRANKLIN NGF Disk SAN Compute Node Lustre Bassi Planck pNSD pNSD BASSI Jacq PDSF PDSF Planck Franklin Disk

Future Plans • No current plans to replace Franklin scratch with NGF scratch or GPFS. However, we plan to evaluate this once the planned upgrades are complete. • Explore Global Scratch – This could start with smaller Linux cluster to prove feasibility • Evaluate tighter integration with HPSS (GHI)

Long Term Questions for GPFS • Scaling to new levels (O(10k) clients) • Quality of Service in a multi-clustered environment (where the aggregate bandwidth of the systems exceed the disk subsystem) • Support for other systems, networks and scale – pNFS could play a role – Other Options • Generalized IO forwarding system (DVS) • Routing layer with abstraction layer to support new networks (LNET)

Acknowledgements NERSC Cray • Matt Andrews • Terry Malberg • Will Baird • Dean Roe • Greg Butler • Kitrick Sheets • Rei Lee • Brian Welty • Nick Cardo

Questions?

Further Information For Further Information: Shane Canon Data System Group Leader Scanon@lbl.gov

GPFS on a Cray XT Shane Canon Data Systems Group Leader Lawrence - PowerPoint PPT Presentation

GPFS on a Cray XT Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009 Outline NERSC Global File System GPFS Overview Comparison of Lustre and GPFS Mounting GPFS on a

Application Performance Tuning on Cray XT Systems Luiz DeRose John Levesque PE Director CSCE

Cray Lustre Model Roadmap Cory Spitz and Derek Robb Cray Inc. 5/24/2011 Introduction and Agenda

Howard Pritchard and Igor Gorodetsky Cray, Inc. Cray User Group Conference 2011 1 Cray User

The Cray 1 Time line 1969 -- CDC Introduces 7600, designed by cray. 1972 -- Design of the

FFT libraries on Cray XT: CRay Adaptive FFT (CRAFFT) Jonathan Bentz Cray Inc. Outline

Introducing the Cray XMT Petr Konecny November 29 th 2007 Agenda Shared memory programming

I/O Performance on Cray XC30 Zhengji Zhao 1) , Doug Petesch 2) , David Knaak 2) , and Tina Declerck

Dave Strenski, Cray Inc. Cray User Group, Atlanta 5-5-09 Storaasli - MRSC - 29 M 07 3 FPGA

Detecting Application Load Imbalance on Cray Systems Heidi Poxon Technical Lead, Performance

Environment (CLE) Performance Jeff Larkin Jeff Kuehn Cray Inc. ORNL <larkin@cray.com>

Cray I/O Software Enhancements Tom Edwards tedwards@cray.com C O M P U T E | S T O R E

Application Characteristics and Performance on a Cray XE6 Performance on a Cray XE6 Courtenay T.

GTC Overflow PARQUET Cray Inc. Confidential Slide 2 Cray has a long tradition of

ALPS Tutorial Ascent Michael Karo mek@cray.com Topics A look back at Base Camp

Diagnostic Capabilities of the Red Storm Compliance Test Suite Mike Davis Cray Inc.

Unifying Heterogeneous Cray Unifying Heterogeneous Cray Resources and Systems into an

Legendrian contact homology of conormal lifts Tobias Ekholm Uppsala University, Sweden

Determination of micropollutants in water samples from swimming pool systems Anna Lempart *,

N = < n ( E, ( T ) , T ) > D ( E ) dE 0 This expression implicitly determines = ( T

Parity Viola+ng Electron Sca2ering Experiments (High Precision

La prognosi del mieloma multiplo oggi: informazioni prognostiche dopo la terapia di induzione

New Generalized Functions Defined by nonstandard discrete Functions and difference quotients Li,

Fede dera ral Utility lity Par Partner tnersh ship ip for for Data a Cent nters Dale

PDSF User Meeting June 10, 2014 Lisa Gerhardt Utilization - 2

Sambuz

Useful Links

Newsletter

Mail Us

GPFS on a Cray XT Shane Canon Data Systems Group Leader Lawrence - PowerPoint PPT Presentation

GPFS on a Cray XT Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009 Outline NERSC Global File System GPFS Overview Comparison of Lustre and GPFS Mounting GPFS on a

Application Performance Tuning on Cray XT Systems Luiz DeRose John Levesque PE Director CSCE

Cray Lustre Model Roadmap Cory Spitz and Derek Robb Cray Inc. 5/24/2011 Introduction and Agenda

Howard Pritchard and Igor Gorodetsky Cray, Inc. Cray User Group Conference 2011 1 Cray User

The Cray 1 Time line 1969 -- CDC Introduces 7600, designed by cray. 1972 -- Design of the

FFT libraries on Cray XT: CRay Adaptive FFT (CRAFFT) Jonathan Bentz Cray Inc. Outline

Introducing the Cray XMT Petr Konecny November 29 th 2007 Agenda Shared memory programming

I/O Performance on Cray XC30 Zhengji Zhao 1) , Doug Petesch 2) , David Knaak 2) , and Tina Declerck

Dave Strenski, Cray Inc. Cray User Group, Atlanta 5-5-09 Storaasli - MRSC - 29 M 07 3 FPGA

Detecting Application Load Imbalance on Cray Systems Heidi Poxon Technical Lead, Performance

Environment (CLE) Performance Jeff Larkin Jeff Kuehn Cray Inc. ORNL &lt;larkin@cray.com&gt;

Cray I/O Software Enhancements Tom Edwards tedwards@cray.com C O M P U T E | S T O R E

Application Characteristics and Performance on a Cray XE6 Performance on a Cray XE6 Courtenay T.

GTC Overflow PARQUET Cray Inc. Confidential Slide 2 Cray has a long tradition of

ALPS Tutorial Ascent Michael Karo mek@cray.com Topics A look back at Base Camp

Diagnostic Capabilities of the Red Storm Compliance Test Suite Mike Davis Cray Inc.

Unifying Heterogeneous Cray Unifying Heterogeneous Cray Resources and Systems into an

Legendrian contact homology of conormal lifts Tobias Ekholm Uppsala University, Sweden

Determination of micropollutants in water samples from swimming pool systems Anna Lempart *,

N = &lt; n ( E, ( T ) , T ) &gt; D ( E ) dE 0 This expression implicitly determines = ( T

Parity Viola+ng Electron Sca2ering Experiments (High Precision

La prognosi del mieloma multiplo oggi: informazioni prognostiche dopo la terapia di induzione

New Generalized Functions Defined by nonstandard discrete Functions and difference quotients Li,

Fede dera ral Utility lity Par Partner tnersh ship ip for for Data a Cent nters Dale

PDSF User Meeting June 10, 2014 Lisa Gerhardt Utilization - 2

Sambuz

Useful Links

Newsletter

Mail Us

Environment (CLE) Performance Jeff Larkin Jeff Kuehn Cray Inc. ORNL <larkin@cray.com>

N = < n ( E, ( T ) , T ) > D ( E ) dE 0 This expression implicitly determines = ( T