the hopper system how the largest xe6 in the world went
play

The Hopper System: How the Largest* XE6 in the World Went From - PowerPoint PPT Presentation

The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality Katie Antypas, Tina Butler, and Jonathan Carter CUG 2011, May 25th, 2011 1 Requirements to Reality Develop RFP Select vendor partner Negotiate SOW


  1. The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality � Katie Antypas, Tina Butler, and Jonathan Carter CUG 2011, May 25th, 2011 1

  2. Requirements to Reality Develop RFP Select vendor partner Negotiate SOW Deliver and Test System Transition to Steady State 2

  3. RFP Draws from User Requirements • 13 ‘ Minimum Requirements ’ (e.g., 24x7 support) that absolutely must be met – Proposals that don ’ t meet are not responsive and are not evaluated further • 38 ‘ Performance Features ’ (e.g., fully featured development environment) wish list of features – Evaluated qualitatively via in-depth study of Offeror narrative. • Benchmarks • Supplier attributes (ability to produce/test, corporate risk, commitment to HPC, etc.) • Cost of ownership (incl. life-cycle, facilities, base, and ongoing costs) and affordability • Best Value Source Selection allows to evaluate and select the proposal that represents the best value 3

  4. NERSC-6 Benchmarks Full Workload SSP, Consistency composite tests CAM, GTC, MILC, GAMESS, full application PARATEC, IMPACT-T, MAESTRO stripped-down app AMR Elliptic Solve kernels NPB Serial, NPB Class D, UPC NPB, FCT Stream, PSNAP, Multipong, system component tests IOR, MetaBench, NetPerf 4

  5. NERSC-6 SSP Metric The largest concurrency time of each full application benchmark is used to calculate the SSP NERSC-6 SSP CAM GAMESS GTC IMPACT-T MAESTRO MILC PARATEC 240p 1024p 2048p 1024p 2048p 8192p 1024p For each benchmark measure • FLOP counts on a reference system • Wall clock run time on various systems 5

  6. Cray Proposal is the Best Value • Best application performance per dollar • Highest sustained application performance commitment • Best sustained application performance per MW • Excellent in-house testing facility and benchmarking/performance/support expertise at Cray • Easy to integrate into our facility • Acceptable risk 6

  7. Negotiation Challenges • Cray proposed two technologies – XT5 available late 2009 with interconnect refresh 2010 • Early cycles to users • Interconnect refresh incurs lengthy down time and hardware fallout • Older memory technology (DDR2) • Fewer cores per node – XE6 available mid 2010 • Latest memory technology (DDR3) • Higher performance node • Latest interconnect delivered with the system • Delivered later • Two phased delivery provides the best value 7

  8. Feedback from NERSC Users was crucial to architecting Hopper Hopper Enhancement User Feedback �� 8 external login nodes with 128 GB of memory (with swap space) Login nodes need more memory Global file system will be available to Connect NERSC Global compute nodes FileSystem to compute nodes • Increased # and amount of memory on Workflow models are limited by MOM nodes memory on MOM (host) nodes • Phase 2 compute nodes can be repartitioned as MOM nodes 8

  9. Feedback from NERSC users was crucial to architecting Hopper Hopper Enhancement User Feedback �� • External login nodes will allow users to login, compile and submit jobs even when computational portion of the machine is down Improve Stability and • External file system will allow Reliability users to access files if the compute system is unavailable and will also give administrators more flexibility during system maintenances • For Phase 2, Gemini interconnect has redundancy and adaptive routing. 9

  10. Data and Batch Access – Compile applications and prepare input – Submit jobs when Login Nodes XE6 Compute XE6 is down – Access data on local and global filesystems when XE6 Login nodes is down mount file systems /project file /scratch file system system 10

  11. Hopper System Phase 2 - XE6 Phase 1 - XT5 • 6384 nodes, 153,600 cores • 668 nodes, 5,344 cores • 2.1 GHz AMD 12-core Opteron • 2.4 GHz AMD 4-core Opteron • 1.27 Pflop/s peak • 50 Tflop/s peak • 140 Tflop/s SSP • 5 Tflop/s SSP • 217 TB DDR3 memory total • 11 TB DDR2 memory total • Gemini Interconnect • Seastar2+ Interconnect • 2 PB disk, 70 GB/s • 2 PB disk, 25 GB/s • Liquid cooled • Air cooled 3Q09 4Q09 1Q10 2Q10 3Q10 4Q10 11

  12. Hopper Phase 1 Installation Delivery Unwrap Install 12

  13. Hopper Phase I Utilization Max 127k system system maintenance maintenance and dedicated I/ O testing • Users were able to immediately utilize the Hopper system • Even with dedicated testing and maintenance times, Hopper utilization from Dec 15 th - March 1st reached 90% 13

  14. Installation and Integration Photos from Tina Butler Site preparation Up and running! Unloading … 14

  15. Installation and Integration Photos from Tina Butler Site preparation Hopper places #5 in TOP 500 List at SC ’ 10 Up and running! Unloading … 15

  16. Hopper Early Hours Breakdown of Early User Hours by • ~320 million early hours Science Area delivered to science Nov 2010 - today offices • ~280 projects have used time • ~1000 users have accessed the system • Consistently 300-400 unique users logged into system at any time 16

  17. Despite being a new, first-in-class peta-flop system, Hopper has run at a high utilization, with good stability from the start • Over 81% utilization in the first month 2.5 month, peak (based on 24 hour day, including maintenances) • System problems that would have been full outages on the XT4 and XT5 can be contained on the XE6 Scheduling • Room for scheduling Scheduling Maintenance improvements, pack large problem Security Patch jobs together, stabilize the system further CLE 3.1 UP03 Hardware • Maintenances a key source Upgrade maintenance of lost utilization, look to minimize 17

  18. Compared to the XT4 and XT5 most applications are seeing increased performance on Hopper Below 1.0 -Application performs better on Hopper • Applications run on Hopper, Franklin and Jaguar at same concurrency • All benchmarks are pure MPI (except GAMESS which uses its own communication layer) • Significant improvement on Hopper for GAMESS due to CAM GAMESS GTC IMPACT-T MAESTRO MILC PARATEC new Cray library on XE6 system. All other benchmarks Cores 240 1024 2048 1024 2048 8192 1024 use identical codes on Hopper, Jaguar and Franklin Data from Nick Wright, Helen He and Marcus Wagner

  19. Despite a slower clock speed, applications on Hopper perform better than the XT4 or XT5… Metric Franklin Hopper Impact on application performance Proc clock speed 2.3 Ghz 2.1 Ghz MPI latency ~6.5 us 1.6 us MPI bandwidth 1.6 GB/sec/node 6.0 GB/sec/node Cache size 2 MB/socket shared L3 6 MB/6 cores shared L3 Memory Speed 800 MHz 1333 MHz Memory Bandwidth ~2 GB/sec/core ~2.2 GB/sec/core This is primarily due to the improved Gemini interconnect and thus less time spent in communication by applications

  20. NERSC/Cray COE on Application Programming Models GTC Fusion Application 20

  21. Large Jobs are Running on the Hopper System Breakdown of Computing Hours by Job Size 100% 80% 153,216 cores on Hopper Raw Hours >43% 60% <43% <10% <1% 40% 20% • Hopper is efficiently running jobs at all scales • During availability period, over 50% of hours have been used for jobs larger than 16k cores. 21

  22. Hopper is providing needed resources for DOE Scientists • Over 320 M early hours delivered • First time a peta-flop system is available to the general DOE research community – Production science runs – Code scalability testing • Hopper is a resilient system – Component failures are more easily isolated – Survives problems that case full crashes on XT4 and XT5 • Researchers appreciate the stability of the system and they want more time “ The best part of Hopper is the ability to put previously unavailable computing resources towards investigations that would otherwise 22 be unapproachable. ” – Hopper User

  23. Acknowledgements • This work was supported by the Director, Office of Science, Office of Advanced Scientific Computing Research, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. • The authors would like to thank: Nick Wright and Helen He for providing early COE results; Manuel Vigil and the ACES team for valuable discussions and test time at the factory; and the Cray on-site and Cray Custom Engineering staff for valuable discussions. 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend