The Hopper System: How the Largest* XE6 in the World Went From - PowerPoint PPT Presentation

The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality � Katie Antypas, Tina Butler, and Jonathan Carter CUG 2011, May 25th, 2011 1

Requirements to Reality Develop RFP Select vendor partner Negotiate SOW Deliver and Test System Transition to Steady State 2

RFP Draws from User Requirements • 13 ‘ Minimum Requirements ’ (e.g., 24x7 support) that absolutely must be met – Proposals that don ’ t meet are not responsive and are not evaluated further • 38 ‘ Performance Features ’ (e.g., fully featured development environment) wish list of features – Evaluated qualitatively via in-depth study of Offeror narrative. • Benchmarks • Supplier attributes (ability to produce/test, corporate risk, commitment to HPC, etc.) • Cost of ownership (incl. life-cycle, facilities, base, and ongoing costs) and affordability • Best Value Source Selection allows to evaluate and select the proposal that represents the best value 3

NERSC-6 Benchmarks Full Workload SSP, Consistency composite tests CAM, GTC, MILC, GAMESS, full application PARATEC, IMPACT-T, MAESTRO stripped-down app AMR Elliptic Solve kernels NPB Serial, NPB Class D, UPC NPB, FCT Stream, PSNAP, Multipong, system component tests IOR, MetaBench, NetPerf 4

NERSC-6 SSP Metric The largest concurrency time of each full application benchmark is used to calculate the SSP NERSC-6 SSP CAM GAMESS GTC IMPACT-T MAESTRO MILC PARATEC 240p 1024p 2048p 1024p 2048p 8192p 1024p For each benchmark measure • FLOP counts on a reference system • Wall clock run time on various systems 5

Cray Proposal is the Best Value • Best application performance per dollar • Highest sustained application performance commitment • Best sustained application performance per MW • Excellent in-house testing facility and benchmarking/performance/support expertise at Cray • Easy to integrate into our facility • Acceptable risk 6

Negotiation Challenges • Cray proposed two technologies – XT5 available late 2009 with interconnect refresh 2010 • Early cycles to users • Interconnect refresh incurs lengthy down time and hardware fallout • Older memory technology (DDR2) • Fewer cores per node – XE6 available mid 2010 • Latest memory technology (DDR3) • Higher performance node • Latest interconnect delivered with the system • Delivered later • Two phased delivery provides the best value 7

Feedback from NERSC Users was crucial to architecting Hopper Hopper Enhancement User Feedback �� 8 external login nodes with 128 GB of memory (with swap space) Login nodes need more memory Global file system will be available to Connect NERSC Global compute nodes FileSystem to compute nodes • Increased # and amount of memory on Workflow models are limited by MOM nodes memory on MOM (host) nodes • Phase 2 compute nodes can be repartitioned as MOM nodes 8

Feedback from NERSC users was crucial to architecting Hopper Hopper Enhancement User Feedback �� • External login nodes will allow users to login, compile and submit jobs even when computational portion of the machine is down Improve Stability and • External file system will allow Reliability users to access files if the compute system is unavailable and will also give administrators more flexibility during system maintenances • For Phase 2, Gemini interconnect has redundancy and adaptive routing. 9

Data and Batch Access – Compile applications and prepare input – Submit jobs when Login Nodes XE6 Compute XE6 is down – Access data on local and global filesystems when XE6 Login nodes is down mount file systems /project file /scratch file system system 10

Hopper System Phase 2 - XE6 Phase 1 - XT5 • 6384 nodes, 153,600 cores • 668 nodes, 5,344 cores • 2.1 GHz AMD 12-core Opteron • 2.4 GHz AMD 4-core Opteron • 1.27 Pflop/s peak • 50 Tflop/s peak • 140 Tflop/s SSP • 5 Tflop/s SSP • 217 TB DDR3 memory total • 11 TB DDR2 memory total • Gemini Interconnect • Seastar2+ Interconnect • 2 PB disk, 70 GB/s • 2 PB disk, 25 GB/s • Liquid cooled • Air cooled 3Q09 4Q09 1Q10 2Q10 3Q10 4Q10 11

Hopper Phase 1 Installation Delivery Unwrap Install 12

Hopper Phase I Utilization Max 127k system system maintenance maintenance and dedicated I/ O testing • Users were able to immediately utilize the Hopper system • Even with dedicated testing and maintenance times, Hopper utilization from Dec 15 th - March 1st reached 90% 13

Installation and Integration Photos from Tina Butler Site preparation Up and running! Unloading … 14

Installation and Integration Photos from Tina Butler Site preparation Hopper places #5 in TOP 500 List at SC ’ 10 Up and running! Unloading … 15

Hopper Early Hours Breakdown of Early User Hours by • ~320 million early hours Science Area delivered to science Nov 2010 - today offices • ~280 projects have used time • ~1000 users have accessed the system • Consistently 300-400 unique users logged into system at any time 16

Despite being a new, first-in-class peta-flop system, Hopper has run at a high utilization, with good stability from the start • Over 81% utilization in the first month 2.5 month, peak (based on 24 hour day, including maintenances) • System problems that would have been full outages on the XT4 and XT5 can be contained on the XE6 Scheduling • Room for scheduling Scheduling Maintenance improvements, pack large problem Security Patch jobs together, stabilize the system further CLE 3.1 UP03 Hardware • Maintenances a key source Upgrade maintenance of lost utilization, look to minimize 17

Compared to the XT4 and XT5 most applications are seeing increased performance on Hopper Below 1.0 -Application performs better on Hopper • Applications run on Hopper, Franklin and Jaguar at same concurrency • All benchmarks are pure MPI (except GAMESS which uses its own communication layer) • Significant improvement on Hopper for GAMESS due to CAM GAMESS GTC IMPACT-T MAESTRO MILC PARATEC new Cray library on XE6 system. All other benchmarks Cores 240 1024 2048 1024 2048 8192 1024 use identical codes on Hopper, Jaguar and Franklin Data from Nick Wright, Helen He and Marcus Wagner

Despite a slower clock speed, applications on Hopper perform better than the XT4 or XT5… Metric Franklin Hopper Impact on application performance Proc clock speed 2.3 Ghz 2.1 Ghz MPI latency ~6.5 us 1.6 us MPI bandwidth 1.6 GB/sec/node 6.0 GB/sec/node Cache size 2 MB/socket shared L3 6 MB/6 cores shared L3 Memory Speed 800 MHz 1333 MHz Memory Bandwidth ~2 GB/sec/core ~2.2 GB/sec/core This is primarily due to the improved Gemini interconnect and thus less time spent in communication by applications

NERSC/Cray COE on Application Programming Models GTC Fusion Application 20

Large Jobs are Running on the Hopper System Breakdown of Computing Hours by Job Size 100% 80% 153,216 cores on Hopper Raw Hours >43% 60% <43% <10% <1% 40% 20% • Hopper is efficiently running jobs at all scales • During availability period, over 50% of hours have been used for jobs larger than 16k cores. 21

Hopper is providing needed resources for DOE Scientists • Over 320 M early hours delivered • First time a peta-flop system is available to the general DOE research community – Production science runs – Code scalability testing • Hopper is a resilient system – Component failures are more easily isolated – Survives problems that case full crashes on XT4 and XT5 • Researchers appreciate the stability of the system and they want more time “ The best part of Hopper is the ability to put previously unavailable computing resources towards investigations that would otherwise 22 be unapproachable. ” – Hopper User

Acknowledgements • This work was supported by the Director, Office of Science, Office of Advanced Scientific Computing Research, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. • The authors would like to thank: Nick Wright and Helen He for providing early COE results; Manuel Vigil and the ACES team for valuable discussions and test time at the factory; and the Cray on-site and Cray Custom Engineering staff for valuable discussions. 23

The Hopper System: How the Largest* XE6 in the World Went From - PowerPoint PPT Presentation

The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality Katie Antypas, Tina Butler, and Jonathan Carter CUG 2011, May 25th, 2011 1 Requirements to Reality Develop RFP Select vendor partner Negotiate SOW

Application Characteristics and Performance on a Cray XE6 Performance on a Cray XE6 Courtenay T.

Benchmark Performance of Different Compilers on a Cray XE6 Mike Stewart and Helen He NERSC User

Bringing Up Cielo: Experiences with a Cray XE6 System Or, Getting Started with Your New 140k

Performance of Density Functional Theory codes on Cray XE6 Zhengji Zhao, and Nicholas Wright

Energy Efficiency Metrics and Cray XE6 Application Performance Wilfried Oed Principal Engineer

Reverse Ordering in Dynamical Reverse Ordering in Dynamical Two- -Dimensional Hopper Flow

QUICK INTRODUCTION People call me GONZ QUICK INTRODUCTION 1. Never went to Art School

Logo slide He went to Nazareth, where he had been brought up, and on the Sabbath day he went into

External Services on the NERSC Hopper System Katie Antypas, Tina Butler, and Jonathan Carter

WORLD WORLD WORLD WORLD WORLD WORLD En End of of the Br Bron onze Age ME MEETI NG 8

Welcome to India World's largest democracy with 1.1 billion people 10 th largest GDP in real

2019 DHIC PROGRAM ABOUT SHANGHAI China is the second largest country in the world GDP. Shanghai

Piece of the eFile/re:Search TX system. Puzzle, Part of the Whole Tracy Hopper serves Harris

Preparing for AngularJS 2.0 Dr. Mike Hopper WDI Instructor General Assembly About Me

What can happen to polymer granules from the supplier's silo to the extruder hopper? Otto

Dr Chris Hopper Current State of the E&P Industry Strategy - Technology - Business The

Hierarchy of Provider Edge Devices in Hierarchy of Provider Edge Devices in BGP/MPLS VPN

Interval (Set) Uncertainty Let Us Try to Find a . . . A Usual Quantum- . . . as a Possible Way

Lesson 2 Greek Vocabulary One does not equal five!!! One does not equal five!!! One does not

Hope Full Psalm 33:20-22 New International Version (NIV) 20 We wait in hope for the L ORD ; he

Make Money With Open Source What is Open Source? Community Free software vs. open source

Performing parallel parameter scans on Hopper at NERSC Robert Ryne LBNL Sept 10, 2012 Bringing

Parallel Coupling of CFD-DEM simulations MUG2018 Gabriele Pozzetti, Xavier Besseron, Alban

Random Expert Distillation For Imitation Learning Ruohan Wang, Carlo

The Hopper System: How the Largest* XE6 in the World Went From - PowerPoint PPT Presentation

The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality Katie Antypas, Tina Butler, and Jonathan Carter CUG 2011, May 25th, 2011 1 Requirements to Reality Develop RFP Select vendor partner Negotiate SOW

Application Characteristics and Performance on a Cray XE6 Performance on a Cray XE6 Courtenay T.

Benchmark Performance of Different Compilers on a Cray XE6 Mike Stewart and Helen He NERSC User

Bringing Up Cielo: Experiences with a Cray XE6 System Or, Getting Started with Your New 140k

Performance of Density Functional Theory codes on Cray XE6 Zhengji Zhao, and Nicholas Wright

Energy Efficiency Metrics and Cray XE6 Application Performance Wilfried Oed Principal Engineer

Reverse Ordering in Dynamical Reverse Ordering in Dynamical Two- -Dimensional Hopper Flow

QUICK INTRODUCTION People call me GONZ QUICK INTRODUCTION 1. Never went to Art School

Logo slide He went to Nazareth, where he had been brought up, and on the Sabbath day he went into

External Services on the NERSC Hopper System Katie Antypas, Tina Butler, and Jonathan Carter

WORLD WORLD WORLD WORLD WORLD WORLD En End of of the Br Bron onze Age ME MEETI NG 8

Welcome to India World's largest democracy with 1.1 billion people 10 th largest GDP in real

2019 DHIC PROGRAM ABOUT SHANGHAI China is the second largest country in the world GDP. Shanghai

Piece of the eFile/re:Search TX system. Puzzle, Part of the Whole Tracy Hopper serves Harris

Preparing for AngularJS 2.0 Dr. Mike Hopper WDI Instructor General Assembly About Me

What can happen to polymer granules from the supplier's silo to the extruder hopper? Otto

Dr Chris Hopper Current State of the E&amp;P Industry Strategy - Technology - Business The

Hierarchy of Provider Edge Devices in Hierarchy of Provider Edge Devices in BGP/MPLS VPN

Interval (Set) Uncertainty Let Us Try to Find a . . . A Usual Quantum- . . . as a Possible Way

Lesson 2 Greek Vocabulary One does not equal five!!! One does not equal five!!! One does not

Hope Full Psalm 33:20-22 New International Version (NIV) 20 We wait in hope for the L ORD ; he

Make Money With Open Source What is Open Source? Community Free software vs. open source

Performing parallel parameter scans on Hopper at NERSC Robert Ryne LBNL Sept 10, 2012 Bringing

Parallel Coupling of CFD-DEM simulations MUG2018 Gabriele Pozzetti, Xavier Besseron, Alban

Random Expert Distillation For Imitation Learning Ruohan Wang, Carlo

Dr Chris Hopper Current State of the E&P Industry Strategy - Technology - Business The