NSF Future of High Performance Computing Bill Kramer NSF Workshop - - PowerPoint PPT Presentation

nsf future of high performance computing
SMART_READER_LITE
LIVE PREVIEW

NSF Future of High Performance Computing Bill Kramer NSF Workshop - - PowerPoint PPT Presentation

NSF Future of High Performance Computing Bill Kramer NSF Workshop on the Future of High Performance Computing Washington DC December 4, 2009 Why Sustained Performance is the Critical Focus Rela#onship between Peak,


slide-1
SLIDE 1

NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009

NSF Future of High Performance Computing

Bill Kramer

slide-2
SLIDE 2

NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009

Why Sustained Performance is the Critical Focus

  • Memory Wall
  • Limitation on computation speed caused by

the growing disparity between processor speed and memory latency and bandwidth

  • From 1986 to 2000, processor speed

increased at an annual rate of 55%, while memory speed improved by only 10% per year

  • Issue
  • Memory latency and bandwidth limitations

within processor make it difficult to achieve major fraction of peak performance of chip

  • Latency and bandwidth limitations of

communication fabric make it difficult to scale science and engineering applications to large numbers of processors

¡-­‑ ¡ ¡ ¡ ¡ ¡5.00 ¡ ¡ ¡10.00 ¡ ¡ ¡15.00 ¡ ¡ ¡20.00 ¡ ¡ 1997 ¡ 1999 ¡ 2001 ¡ 2002 ¡ 2005 ¡ 2007 ¡ 2008 ¡

Ra#o ¡Linpack/SSP ¡for ¡ NERSC ¡Systems ¡

Ra-o ¡ Linpack/SSP ¡ for ¡NERSC ¡ Systems ¡ 0 ¡ 50 ¡ 100 ¡ 150 ¡ 200 ¡ 250 ¡ 300 ¡ 350 ¡ 400 ¡ 1997 ¡ 1999 ¡ 2001 ¡ 2002 ¡ 2005 ¡ 2007 ¡ 2008 ¡ TF/s ¡

Rela#onship ¡between ¡Peak, ¡ Linpack ¡and ¡Sustained ¡ Performance ¡Using ¡SSP ¡

Peak ¡(TF) ¡ Linpack ¡(TF) ¡ Normalized ¡SSP ¡(TF) ¡ 2

slide-3
SLIDE 3

NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009

Recommendation

  • Adopt a longer term focus, rather than the three to 5 year focus, which is really just the useful

lifetime of a single system.

  • Achieving and using an Exascale systems, or the equivalent of 10s of 100 Petascale systems, will span 15

years and a progression of resource deployments.

  • NSF will be well served to create a 15 year funding program the combines the total cost of acquiring,

supporting and using the resources.

  • This strategy should include creating a supporting facility infrastructure that allow efficient technology refresh

to be quickly deployed and integrated with the existing resources.

  • To enable effective resource insertion, NSF should separate the selection of organizations that

provision and support HPC resources from the resource selection itself.

  • The current NSF practice of issuing separate solicitations that combine an organization as a service provider

and a sole system choice for each resource refreshment leads to sub-optimization that can result in neither the most effective organization nor the best value technology.

  • Focus on true application sustained performance.
  • Using something like “Sustained System Performance” to determine the best value resource solutions will

enable NSF to have the most cost effective computing environments for the computational science communities.

  • The use of state of the practice open, best value procurements that enable comparing technology choices on

sustained performance but allow vendors flexibility.

  • NSF should take the lead in redefining the debate – away from simple metrics and TOP500 and towards

meaningful measures for science.

3

slide-4
SLIDE 4

NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009

Recommendation

  • NSF should follow the industry trend to concentrate its computational and data

storages resources at a few locations that can then make long term investments that are amortized over a series of technology refreshments.

  • These locations should be determined by the ability of the organization(s) to manage large

scale, early release systems, support an evolving computational science community, cost effective extreme scale infrastructure, ability to attract and engage to world class computer science and computational science staff.

  • The NSF should develop an appropriate balance of ‘production quality’ and

‘experimental’ resources.

  • Production quality means systems from well known architectures (albeit they may be early

deliver versions of new generations) with proven Performance, Effectiveness, Reliability, Consistency and Usability for the primary mission of use by for computational science.

  • “Experimental’ resources are those that have potential to be disruptive technology leading to

significant (~10x) performance and/or price performance improvements.

  • The mission of these types of systems is clearly different and would have different missions.
  • A typical investment strategy might be 85% production/15% experimental.
  • NSF should establish a “best practice” review of both US fund resources, and

international funding programs.

  • NSF should invest in “performance based design” for all application areas.

4

slide-5
SLIDE 5

NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009

Geographic Distribution of PRACs Leaders

slide-6
SLIDE 6

NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009

Recommendation

  • NSF should separate the provisioning of a national science network from mid-ware

software and/or compute and storage resource provisioning.

  • A national science network that serves the extreme scale computational data resources, major

communities of computational and data scientists, major observational and experimental resources needs a long term roadmap that has consistent funding and a plan to technology

  • insertion. A model for such a plan can be found in the DOE’s ESnet program among others.
  • NSF should likewise have a sustained program for distributed (aka cloud) middle ware

software creation and support.

  • This support needs to be synchronized with the computational, data and networking

components of the NSF strategy, but needs to be an independent program component.

  • NSF should support expanded development and evolution of extreme scale system

software aligned with the IESP roadmap.

  • There are contract arrangements that can assure both high quality systems and

services and innovation and advanced technology in whatever balance NSF needs.

  • Performance and Rewards based contracts
  • Deployment Project Management and On-going operational assessments ala ITIL
  • Example - agreement 6 year base term, renewable for up to a total of 16 years
  • Automatic and well as discretionary extensions that benefit both NSF and providing organizations

6

slide-7
SLIDE 7

NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009

ADDITIONAL SLIDES

7

slide-8
SLIDE 8

NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009

A Generalized Sustained System Performance (SSP) Framework

  • Is an effective and flexible way to evaluate systems
  • Determined the Sustained System Performance for each phase of each system

1. Establish a set of performance tests that reflect the intended work the system will do

  • Can be any number of tests as long as they have a common measure of performance

2. A test consists of a code and a problem set 3. Establish the amount work (ops) the test needs to do for a fixed concurrency or a fixed problem set 4. Time each test execution – use wall clock time 5. Determine the amount of work done for a given scalable unit (node, socket, core, task, thread, interface, etc.)

  • Work = Total operations/total time/number of scalable units used for the test

6. Composite the work per scalable unit for all tests

  • Composite functions based on circumstances and test selection criteria
  • Can be weighed or not as desired

7. Determine the SSP of a system at any time period by multiplying the composite work per scalable unit by the number of scalable units in the system

12/4/09 8

slide-9
SLIDE 9

NSF Workshop on the Future of High Performance Computing • Washington DC December 4, 2009

Examples of Using the (SSP) Framework

  • Test a system upon delivery, use to select a system, etc.
  • Determine the Potency of the system - how well will the system perform the expected work over

some time period

  • Potency is the sum, over the specified time, of the product of a system’s SSP and the time

period of that SSP over some time period

  • Different SSPs for different periods
  • Different SSPs for different types of computation units (heterogeneous)
  • Determine the Cost of systems
  • Cost can be any resource units ($, Watts, space…) and with any complexity (Initial, TCO,…)
  • Determine the Value of the system
  • Value is the potency divide by a cost function
  • If needed, compare the value of different system alternatives or compare against expectations

12/4/09 9