External Services on the NERSC Hopper System Katie Antypas, Tina - PowerPoint PPT Presentation

External Services on the NERSC Hopper System Katie Antypas, Tina Butler, and Jonathan Carter Cray User Group May 27th, 2010 1

NERSC is the Production Facility for DOE Office of Science • NERSC serves a large population 2009 Allocations Approximately 3000 users, 400 projects, 500 code instances • Focus on – Expert consulting and other services – High end computing systems – Global storage systems – Interface to high speed networking • Science-driven – Machine procured competitively using application benchmarks from DOE/SC – Allocations controlled by DOE/SC Program Offices to couple with funding decisions 2

NERSC Systems for Science Large-Scale Computing System Franklin (NERSC-5): Cray XT4 • 9,532 compute nodes; 38,128 cores • ~25 Tflop/s on applications; 356 Tflop/s peak Hopper (NERSC-6): Cray XT • Phase 1: Cray XT5, 668 nodes, 5344 cores • Phase 2: > 1 Pflop/s peak (late 2010 delivery) Analytics / Clusters NERSC Global Visualization Filesystem (NGF) • Euclid large Uses IBM’s GPFS memory Carver 1.5 PB; 5.5 GB/s machine (512 • IBM iDataplex cluster GB shared HPSS Archival Storage memory) PDSF (HEP/NP) • 59 PB capacity • Linux cluster (~1K cores) • GPU • 11 Tape libraries testbed Cloud testbed • 140 TB disk cache ~40 nodes • IBM iDataplex cluster 3

Hopper System Phase 2 Phase 1 - XT5 • ~6400 nodes, ~150,000 cores • 668 nodes, 5,344 cores • 1.9+ GHz AMD Opteron (Magny- • 2.4 GHz AMD Opteron Cours, 12-core ) (Shanghai, 4-core) • ~1.0 Pflop/s peak • 50 Tflop/s peak • ~100 Tflop/s SSP • 5 Tflop/s SSP • ~200 TB DDR3 memory total • 11 TB DDR2 memory total • Gemini Interconnect • Seastar2+ Interconnect • 2 PB disk, ~70 GB/s • 2 PB disk, 25 GB/s • Liquid cooled • Air cooled 3Q09 4Q09 1Q10 2Q10 3Q10 4Q10 4

Feedback from NERSC Users was crucial to designing Hopper Hoppper Enhancement User Feedback from Franklin 8 external login nodes with 128 GB of Login nodes need more memory memory (with swap space) Global file system will be available to Connect NERSC Global compute nodes FileSystem to compute nodes •Increased # and amount of memory on Workflow models are limited by MOM nodes memory on MOM (host) nodes •Phase II compute nodes can be repartitioned as MOM nodes 5

Feedback from NERSC users was crucial to designing Hopper Hopper Enhancement User Feedback from Franklin •External login nodes will allow users to login, compile and submit jobs even when computational portion of the machine is down •External file system will allow users to access files if the compute Improve Stability and system is unavailable and will also Reliability give administrators more flexibility during system maintenances •For Phase 2, Gemini interconnect has redundancy and adaptive routing. 6

Hopper Phase 1 - Key Dates • Phase 1 system arrives Oct 12, 2009 • Integration complete Nov 18, 2009 • Earliest users on system Nov 18, 2009 • All user accounts enabled Dec 15, 2009 • System Accepted Feb 2, 2010 • Account charging begins Mar 01, 2010 7

Hopper Installation Delivery Unwrap Install 8

Hopper Phase I Utilization Max 127k system system maintenance maintenance and dedicated I/O testing • Users were able to immediately utilize the Hopper system • Even with dedicated testing and maintenance times, Hopper utilization from Dec 15 th - March 1st reached 90% 9

Phase 1 Schematic NERSC NERSC GigE LAN FC-8 SAN Es* management network NERSC GPFS External 10GbE LAN Storage to HPSS Mgt Server Main System GPFS 2$$34#56789 Metadata :";/7$-56<56= Spare )*+,$-./#01 MDS MDS DDR/QDR IB Switch Fabric 4 esDM Servers !""#$%&'( LSI 3992 SMW 48 OSSes RAID RAID 1+0 1+0 FC-8 Switch Fabric 24 LSI 7900 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 LUNs LUNs LUNs LUNs LUNs LUNs LUNs LUNs LUNs LUNs LUNs LUNs LUNs LUNs LUNs LUNs LUNs LUNs LUNs LUNs LUNs LUNs LUNs LUNs 10

System Configuration Nodes Chip Freq Memory 664 Compute 2 x Opteron QC 2.4 GHz 16 GB 36 (10 DVS + 24 1 x Opteron DC 2.6 GHz 8 GB Lustre + 2 Network) 4 Service 1 x Opteron DC 2.6 GHz 8 GB 12 DVS (Shared 2 x Opteron QC 2.4 GHz 16 GB root) 6 MOM 1 x Opteron DC 2.6 GHz 8 GB 11

ES System Configuration Nodes Sever Chip Freq Memory 8 Login Dell R905 4 x Opteron QC 2.4 GHz 128 GB 48 OSS + Dell R805 4 x Opteron QC 2.6 GHz 16 GB 3 MDS 4 DM Dell R805 4 x Opteron QC 2.6 GHz 16 GB MS Dell R710 4 x Xeon QC 2.67 GHz 48 GB • 24 LSI 7900 controllers • 120TB configured as 12 RAID6 LUNs per controller 12

esLogin • Goals • Solutions – Ability to run post-processing – Cray packaged software and other small applications updates both internal and directly on login nodes without external nodes interfering with other users – Run local batch servers – Faster compilations transparently – Ability to access data and – Configuration management submit jobs if system goes software, e.g. SystemImager down • Results • Challenges – Users report more responsive – New for Cray; one of first sites login nodes – Creating a consistent – “The login nodes are much environment between external more responsive, I haven't and internal nodes had any of the issues I had – Configuring batch with Franklin in the early environment with external days.” Martin White login nodes – No complete cluster mgt – Provisioning and configuration system yet management 13

esFS • Goals • Solutions – Highly available filesystem – With manual failover, – Ability to access data when servers can be updated via a rolling upgrade reducing system is unavailable downtime • Challenges – Configuration management – Different support model software, e.g. SystemImager – Oracle-supported Lustre • Results 1.8 GA server, Cray- – Users report a stable reliable supported 1.6 clients system – Automatic failover, assuring that if one OSS or – “I have had no problems compiling etc, and my jobs MDS fails the spare picks have had a very high up success rate.” Andrew – Provisioning and Aspen configuration management – No complete cluster mgt system yet – No automatic failover yet 14

esDM • Goals – Offload traffic to/from mass storage system from login nodes • Challenges – Consistent user interface to mass storage system • Solutions – Client modified for third-party transfers • Results – Expect main benefits for Phase 2 – Porting client to internal login nodes 15

Data and Batch Access • Prepare and submit jobs when XT down – Compile applications and prepare input – Local Torque servers on Internal XT system login nodes provide Login Nodes • Compute nodes routing queues •Local Torque • Mom nodes – Holds jobs while XT is Server Routes Jobs • DVS nodes down • Internal PBS server – Jobs forwarded to internal XT Torque Login nodes server when XT available mount file – Batch command systems wrappers hide complexity of multiple servers and ensure /project file /scratch file consistent view system system 16

Data and Batch Access • Prepare and submit jobs when XT down – Compile applications and prepare input – Local Torque servers on login nodes provide Login Nodes routing queues •Local Torque Internal XT system – Holds jobs while XT is Server Holds Jobs down – Jobs forwarded to internal XT Torque Login nodes server when XT available mount file – Batch command systems wrappers hide complexity of multiple servers and ensure /project file /scratch file consistent view system system 17

Summary • Benefits – Improved reliability and usability • Challenges – Not a standardized offering • One-of-a-kind systems by Custom Engineering • Software levels different from Cray products – Synchronization & Consistency • Lack of complete cluster management system • Software packaging • Recommendations – A product based on external services 18

Enabling New Science This work was supported by the Director, Office of Science, Office of Advanced Scientific Computing Research, of the U.S. Department of Energy under Contract No. DE- AC02-05CH11231. 19

External Services on the NERSC Hopper System Katie Antypas, Tina - PowerPoint PPT Presentation

External Services on the NERSC Hopper System Katie Antypas, Tina Butler, and Jonathan Carter Cray User Group May 27th, 2010 1 NERSC is the Production Facility for DOE Office of Science NERSC serves a large population 2009 Allocations

DVS, GPFS and External Lustre at NERSC How Its Working on Hopper Tina Butler, Rei Chi Lee,

Mendel at NERSC: Multiple Workloads on a Single Linux Cluster Larry Pezzaglia NERSC

UPDATE ON NERSC PScheD EXPERIENCES, A CONTINUING SUCCESS STORY Tina Butler - NERSC Brent Draney

Recent Workload Characterization Activities at NERSC Harvey Wasserman NERSC Science Driven System

Accelerating Experimental Workflows on NERSC systems Katie Antypas NERSC Division Deputy

Tapes Not Dead At LBNL/NERSC Nick Balthaser MSST 2019 May 21, 2019 Storage @NERSC

Performing parallel parameter scans on Hopper at NERSC Robert Ryne LBNL Sept 10, 2012 Bringing

SLURM. Our Way. Douglas Jacobsen, James Botts, Helen He NERSC CUG 2016 NERSC Vital Statistics

Building Web Gateways to Science in Python Shreyas Cholia NERSC/LBL SciPy 2010 Jun 30 th 2010

Filesystems and I/O Balance on the NERSC T3E Tina Butler, NERSC Systems Group This work was

Post-Mortem of the NERSC Franklin XT Upgrade to CLE 2.1 James M. Craw, Nicholas P. Cardo, Yun

RAMP for Exascale RAMP Wrap August 25th, 2010 Kathy Yelick NERSC Overview NERSC represents

Perlmutter - A 2020 Pre-Exascale GPU-accelerated System for NERSC - Architecture and Application

National Energy Research Scientific Computing Center (NERSC) Detecting System Problems With

Benchmark Performance of Different Compilers on a Cray XE6 Mike Stewart and Helen He NERSC User

Accelerating Science with the NERSC Burst Buffer Debbie Bard Big Data Architect, Data and

The Europa Plasma Environment Tristan Weber LASP Fran Bagenal, Robert Wilson, Vincent Dols

GenFoo: a general Fokker-Planck solver with applications in fusion plasma physics L. J. Hk

Macronova and its Radio-Remnant Kenta Hotokezaka (Hebrew University) recent collaborators T.

Studies of the Helicon Plasma Source with Inhomogeneous Magnetic Field I.V.Shikhvotsev 1,2, a) ,

Sixth to Eighth Grade Sixth to Eighth Grade Task & T ask & Teacher Analy eacher

Merging NNLO calculations with higher-order resummation and partons showers in GENEVA

Measurement of cross sections and properties of the Higgs boson in decays to bosons with the ATLAS

Resummation of transverse observables in momentum space: phenomenology Emanuele Re CERN &

Sambuz

Useful Links

Newsletter

Mail Us

External Services on the NERSC Hopper System Katie Antypas, Tina - PowerPoint PPT Presentation

External Services on the NERSC Hopper System Katie Antypas, Tina Butler, and Jonathan Carter Cray User Group May 27th, 2010 1 NERSC is the Production Facility for DOE Office of Science NERSC serves a large population 2009 Allocations

DVS, GPFS and External Lustre at NERSC How Its Working on Hopper Tina Butler, Rei Chi Lee,

Mendel at NERSC: Multiple Workloads on a Single Linux Cluster Larry Pezzaglia NERSC

UPDATE ON NERSC PScheD EXPERIENCES, A CONTINUING SUCCESS STORY Tina Butler - NERSC Brent Draney

Recent Workload Characterization Activities at NERSC Harvey Wasserman NERSC Science Driven System

Accelerating Experimental Workflows on NERSC systems Katie Antypas NERSC Division Deputy

Tapes Not Dead At LBNL/NERSC Nick Balthaser MSST 2019 May 21, 2019 Storage @NERSC

Performing parallel parameter scans on Hopper at NERSC Robert Ryne LBNL Sept 10, 2012 Bringing

SLURM. Our Way. Douglas Jacobsen, James Botts, Helen He NERSC CUG 2016 NERSC Vital Statistics

Building Web Gateways to Science in Python Shreyas Cholia NERSC/LBL SciPy 2010 Jun 30 th 2010

Filesystems and I/O Balance on the NERSC T3E Tina Butler, NERSC Systems Group This work was

Post-Mortem of the NERSC Franklin XT Upgrade to CLE 2.1 James M. Craw, Nicholas P. Cardo, Yun

RAMP for Exascale RAMP Wrap August 25th, 2010 Kathy Yelick NERSC Overview NERSC represents

Perlmutter - A 2020 Pre-Exascale GPU-accelerated System for NERSC - Architecture and Application

National Energy Research Scientific Computing Center (NERSC) Detecting System Problems With

Benchmark Performance of Different Compilers on a Cray XE6 Mike Stewart and Helen He NERSC User

Accelerating Science with the NERSC Burst Buffer Debbie Bard Big Data Architect, Data and

The Europa Plasma Environment Tristan Weber LASP Fran Bagenal, Robert Wilson, Vincent Dols

GenFoo: a general Fokker-Planck solver with applications in fusion plasma physics L. J. Hk

Macronova and its Radio-Remnant Kenta Hotokezaka (Hebrew University) recent collaborators T.

Studies of the Helicon Plasma Source with Inhomogeneous Magnetic Field I.V.Shikhvotsev 1,2, a) ,

Sixth to Eighth Grade Sixth to Eighth Grade Task &amp; T ask &amp; Teacher Analy eacher

Merging NNLO calculations with higher-order resummation and partons showers in GENEVA

Measurement of cross sections and properties of the Higgs boson in decays to bosons with the ATLAS

Resummation of transverse observables in momentum space: phenomenology Emanuele Re CERN &amp;

Sambuz

Useful Links

Newsletter

Mail Us

Sixth to Eighth Grade Sixth to Eighth Grade Task & T ask & Teacher Analy eacher

Resummation of transverse observables in momentum space: phenomenology Emanuele Re CERN &