NSF Future of High Performance Computing Bill Kramer NSF Workshop - PowerPoint PPT Presentation

NSF Future of High Performance Computing Bill Kramer NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009

Why Sustained Performance is the Critical Focus Rela#onship ¡between ¡Peak, ¡ • Memory Wall Linpack ¡and ¡Sustained ¡ • Limitation on computation speed caused by the growing disparity between processor Performance ¡Using ¡SSP ¡ speed and memory latency and bandwidth 400 ¡ • From 1986 to 2000, processor speed 350 ¡ increased at an annual rate of 55%, while 300 ¡ 250 ¡ memory speed improved by only 10% per TF/s ¡ Peak ¡(TF) ¡ 200 ¡ year 150 ¡ Linpack ¡(TF) ¡ • Issue 100 ¡ Normalized ¡SSP ¡(TF) ¡ 50 ¡ • Memory latency and bandwidth limitations 0 ¡ within processor make it difficult to achieve 1997 ¡ 1999 ¡ 2001 ¡ 2002 ¡ 2005 ¡ 2007 ¡ 2008 ¡ major fraction of peak performance of chip Ra#o ¡Linpack/SSP ¡for ¡ • Latency and bandwidth limitations of NERSC ¡Systems ¡ communication fabric make it difficult to scale science and engineering applications to large ¡20.00 ¡ ¡ numbers of processors ¡15.00 ¡ ¡ ¡10.00 ¡ ¡ Ra-o ¡ ¡5.00 ¡ ¡ Linpack/SSP ¡ for ¡NERSC ¡ ¡-‑ ¡ ¡ ¡ ¡ Systems ¡ 1997 ¡ 1999 ¡ 2001 ¡ 2002 ¡ 2005 ¡ 2007 ¡ 2008 ¡ NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009 2

Recommendation • Adopt a longer term focus, rather than the three to 5 year focus, which is really just the useful lifetime of a single system. • Achieving and using an Exascale systems, or the equivalent of 10s of 100 Petascale systems, will span 15 years and a progression of resource deployments. • NSF will be well served to create a 15 year funding program the combines the total cost of acquiring, supporting and using the resources. • This strategy should include creating a supporting facility infrastructure that allow efficient technology refresh to be quickly deployed and integrated with the existing resources. • To enable effective resource insertion, NSF should separate the selection of organizations that provision and support HPC resources from the resource selection itself. • The current NSF practice of issuing separate solicitations that combine an organization as a service provider and a sole system choice for each resource refreshment leads to sub-optimization that can result in neither the most effective organization nor the best value technology. • Focus on true application sustained performance. • Using something like “Sustained System Performance” to determine the best value resource solutions will enable NSF to have the most cost effective computing environments for the computational science communities. • The use of state of the practice open, best value procurements that enable comparing technology choices on sustained performance but allow vendors flexibility. • NSF should take the lead in redefining the debate – away from simple metrics and TOP500 and towards meaningful measures for science. NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009 3

Recommendation • NSF should follow the industry trend to concentrate its computational and data storages resources at a few locations that can then make long term investments that are amortized over a series of technology refreshments. • These locations should be determined by the ability of the organization(s) to manage large scale, early release systems, support an evolving computational science community, cost effective extreme scale infrastructure, ability to attract and engage to world class computer science and computational science staff. • The NSF should develop an appropriate balance of ‘production quality’ and ‘experimental’ resources. • Production quality means systems from well known architectures (albeit they may be early deliver versions of new generations) with proven Performance, Effectiveness, Reliability, Consistency and Usability for the primary mission of use by for computational science. • “Experimental’ resources are those that have potential to be disruptive technology leading to significant (~10x) performance and/or price performance improvements. • The mission of these types of systems is clearly different and would have different missions. • A typical investment strategy might be 85% production/15% experimental. • NSF should establish a “best practice” review of both US fund resources, and international funding programs. • NSF should invest in “performance based design” for all application areas. NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009 4

Geographic Distribution of PRACs Leaders NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009

Recommendation • NSF should separate the provisioning of a national science network from mid-ware software and/or compute and storage resource provisioning. • A national science network that serves the extreme scale computational data resources, major communities of computational and data scientists, major observational and experimental resources needs a long term roadmap that has consistent funding and a plan to technology insertion. A model for such a plan can be found in the DOE’s ESnet program among others. • NSF should likewise have a sustained program for distributed (aka cloud) middle ware software creation and support. • This support needs to be synchronized with the computational, data and networking components of the NSF strategy, but needs to be an independent program component. • NSF should support expanded development and evolution of extreme scale system software aligned with the IESP roadmap. • There are contract arrangements that can assure both high quality systems and services and innovation and advanced technology in whatever balance NSF needs. • Performance and Rewards based contracts Deployment Project Management and On-going operational assessments ala ITIL • • Example - agreement 6 year base term, renewable for up to a total of 16 years • Automatic and well as discretionary extensions that benefit both NSF and providing organizations NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009 6

ADDITIONAL SLIDES NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009 7

A Generalized Sustained System Performance (SSP) Framework • Is an effective and flexible way to evaluate systems • Determined the Sustained System Performance for each phase of each system 1. Establish a set of performance tests that reflect the intended work the system will do • Can be any number of tests as long as they have a common measure of performance 2. A test consists of a code and a problem set 3. Establish the amount work (ops) the test needs to do for a fixed concurrency or a fixed problem set 4. Time each test execution – use wall clock time 5. Determine the amount of work done for a given scalable unit (node, socket, core, task, thread, interface, etc.) • Work = Total operations/total time/number of scalable units used for the test 6. Composite the work per scalable unit for all tests • Composite functions based on circumstances and test selection criteria • Can be weighed or not as desired 7. Determine the SSP of a system at any time period by multiplying the composite work per scalable unit by the number of scalable units in the system NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009 12/4/09 8

Examples of Using the (SSP) Framework • Test a system upon delivery, use to select a system, etc. • Determine the Potency of the system - how well will the system perform the expected work over some time period • Potency is the sum, over the specified time, of the product of a system’s SSP and the time period of that SSP over some time period • Different SSPs for different periods • Different SSPs for different types of computation units (heterogeneous) • Determine the Cost of systems • Cost can be any resource units ($, Watts, space…) and with any complexity (Initial, TCO,…) • Determine the Value of the system • Value is the potency divide by a cost function • If needed, compare the value of different system alternatives or compare against expectations NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009 12/4/09 9

NSF Future of High Performance Computing Bill Kramer NSF Workshop - PowerPoint PPT Presentation

NSF Future of High Performance Computing Bill Kramer NSF Workshop on the Future of High Performance Computing Washington DC December 4, 2009 Why Sustained Performance is the Critical Focus Rela#onship between Peak,

The NSF Graduate The NSF Graduate The NSF Graduate The NSF Graduate Research Fellowship

NSF and Arctic Oceans science NSF and Arctic Oceans science Simon Stephenson Simon Stephenson

NSF Activities in Cyber Trust NSF Activities in Cyber Trust NSF Activities in Cyber Trust For

An Overview of High An Overview of High Performance Computing and Performance Computing and

Principles of Food Safety Equipment Design Hygiene Standards for a Safer Foodservice Environment

NSF: Supporting Research and Education to Benefit the Nation Denise M. Barnes, Head NSF EPSCoR

Overview of Nanoscale Science and Engineering Education Programs at NSF 2019 NSF Nanoscale

NSF EPSCoR - Welcome Uma D. Venkateswaran Program Director, NSF EPSCoR uvenkate@nsf.gov

NSF NSF CAREER CAREER Pr Program NSF 14 532 Why Apply Wh Apply fo for a CAREER? CAREER?

National Science Foundation (NSF) Information and Funding Opportunities Dr. Robert Landers (

New York University High Performance Computing High Performance Computing Information

Getting the Performance Out Of Getting the Performance Out Of High Performance Computing High

High Performance Computing in Web Browsers CE Seminar WT14/15 Henning Lohse High Performance

Future Directions in High Future Directions in High P Performance Computing Performance

NSF EPSCoR Informational Meeting Presentation November 1, 2012 NSF EPSCoR About EPSCoR EPSCoR

What NSF Does NSF Mission Promote the progress of science Advance the national health,

Welcome What Employers Need To Know to todays webinar Paul Byrne Rachel Hynes Managing

What is good, better, and the best audio? For WinHEC Cortana Workshop, Shenzhen, 11 August 2015

Low-level mitigations Nadia Heninger and Deian Stefan Some slides adopted from Kirill Levchenko,

SECURITY IN THE SMART GRID R E B E C C A VA N DY K E MY BACKGROUND Masters student in ECE

I/O I/O: Connecting to Outside World So far, weve learned how to: compute with values in

Lec06: DEP and ASLR Taesoo Kim 2 Scoreboard 3 NSA Codebreaker Challenges 4 Administrivia

Solving Random Subset Sum Problem by l p -norm SVP Oracle Gengran Hu joint work with Yanbin Pan,

Design of optimal Runge-Kutta methods David I. Ketcheson King Abdullah University of Science

Sambuz

Useful Links

Newsletter

Mail Us