an overview of high
play

An Overview Of High Performance Computing And Challenges For The - PowerPoint PPT Presentation

An Overview Of High Performance Computing And Challenges For The Future Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 2/13/2009 1 A Growth-Factor of a Billion Super Scalar/Special


  1. An Overview Of High Performance Computing And Challenges For The Future Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 2/13/2009 1

  2. A Growth-Factor of a Billion Super Scalar/Special Purpose/Parallel in Performance in a Career 1 PFlop/s IBM (10 15 ) Parallel RoadRunner 2X Transistors/Chip Cray Jaguar Every 1.5 Years ASCI White ASCI Red Pacific 1 TFlop/s (10 12 ) TMC CM-5 Cray T3D Vector TMC CM-2 Cray 2 1 GFlop/s Cray X-MP Super Scalar (10 9 ) Cray 1 1941 1 (Floating Point operations / second, Flop/s) CDC 7600 IBM 360/195 1945 100 Scalar 1 MFlop/s 1949 1,000 (1 KiloFlop/s /s, , KFlop/s) 1951 10,000 (10 6 ) CDC 6600 1961 100,000 1964 1,000,000 (1 MegaFlop/s, , MFlop/s) IBM 7090 1968 10,000,000 1975 100,000,000 1987 1,000,000,0 ,000 (1 GigaFlop/s /s, , GFlop/s /s) 1992 10,000,000,0 ,000 1993 100,000,000,0 ,000 1 KFlop/s 1997 1,000,000,0 ,000,0 ,000 00 (1 TeraFlop/s /s, , TFlop/s) (10 3 ) UNIVAC 1 2000 10,000,000,0 ,000,00 ,000 EDSAC 1 2007 2007 478,000,000,0 478,000,000,000, 00,000 000 (478 478 Tflop/s) 07 2009 1,100,000,0 ,000,0 ,000 00,0 ,000 (1.1 PetaFlop/s /s) 2 1950 1960 1970 1980 1990 2000 2010

  3. H. Meuer, H. Simon, E. Strohmaier, & JD - Listing of the 500 most powerful Computers in the World - Yardstick: Rmax from LINPACK MPP Ax=b, dense problem TPP performance Rate - Updated twice a year Size SC‘xy in the States in November Meeting in Germany in June - All data available from www.top500.org 07 3

  4. Performance Development 100 Pflop/s 100000000 16.9 10 Pflop/s 10000000 1.1 PFlop/s 1 Pflop/s 1000000 SUM 100 Tflop/s 100000 12.6 TFlop/s 10 Tflop/s N=1 10000 1.17 1 Tflop/s 6-8 years 1000 100 Gflop/s 100 59.7 GFlop/s 10 Gflop/s N=500 10 My Laptop 1 Gflop/s 1 400 MFlop/s 100 Mflop/s 0.1 1994 1996 1998 2000 2002 2004 2006 2008

  5. Performance Development and Projections ~1000 Year ~1 Year ~8 Hours ~1 Min. 1.E+19 Eflop/s 1.E+18 SUM 1.E+17 16.9 PFlop/s N=1 1.E+16 Pflop/s 1.E+15 1.1 PFlop/s 1.E+14 12.6 TFlop/s 1.E+13 1.17 TFlop/s Tflop/s N=500 1.E+12 1.E+11 59.7 GFlop/s 1.E+10 Gflop/s 1.E+09 400 MFlop/s 1.E+08 1.E+07 1.E+06 1.E+05 Cray 2 ASCI Red RoadRunner 1 Gflop/s 1 Tflop/s 1.1 Pflop/s 1 Eflop/s O(1) Thread O(10 3 ) Threads O(10 6 ) Threads O(10 9 ) Threads

  6. Processors / Systems Xeon E54xx (Harpertown) 2% 2% Xeon 51xx (Woodcrest) 3% Xeon 53xx (Clovertown) 37% Xeon L54xx (Harpertown) 6% Opteron Quad Core 6% Opteron Dual Core PowerPC 440 7% PowerPC 450 POWER6 13% Intel 71% 14% AMD 13% Others IBM 7%

  7. Cluster Interconnects 300 250 200 GigE 150 Myrinet Infiniband 100 Quadrics 50 0 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008

  8. Efficiency 1.00 0.90 0.80 0.70 Effeciency 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0 100 200 300 400 500 TOP500 Ranking

  9. Cores Per Socket 500 400 300 Systems 9 4 200 2 1 4 cores: 67% 100 2 cores: 31% 9 cores: 7 systems Single core: 4 systems 0

  10. Core Count 500 1 2 "4-7" 400 "8-15" 16-31 17-32 300 33-64 Systems 65-128 129-256 200 257-512 513-1024 1025-2048 2049-4096 100 4k-8k 8k-16k 16k-32k 0 32k-64k 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 64k-128k 128k-

  11. Countries / System Share 2% 1% 1% 58% United States 2% 2% 6% 9% United Kingdom 2% 5% France 3% 5% Germany 4% Japan 4% 3% China 5% 2% Italy 2% 58% Sweden 5% 2% India 2% 9% Russia 1% Spain 1% Poland

  12. Customer Segments 500 400 Others Government 300 Systems Vendor Classified 200 Academic Research 100 Industry 0 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008

  13. Distribution of the Top500 1200 1.1 Pflop/s 1100 1000 2 systems > 1 Pflop/s 900 800 Rmax (Tflop/s) 700 19 systems > 100 Tflop/s 600 500 400 51 systems > 50 Tflop/s 300 119 systems > 25 Tflop/s 200 100 12.6 Tflop/s 0 1 27 53 79 105 131 157 183 209 235 261 287 313 339 365 391 417 443 469 495 Rank

  14. Replacement Rate 350 300 250 267 200 150 100 50 0 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008

  15. 32 nd List: The TOP10 Rmax Rmax/ Power Rank Site Computer Country Cores [MW] MF/W [Tflops] Rpeak IBM / Roadrunner - 1 DOE/NNSA/LANL USA 129600 1105.0 76% 2.48 445 BladeCenter QS22/LS21 DOE/Oak Ridge Cray / Jaguar - Cray XT5 2 USA 150152 1059.0 77% 6.95 152 National Laboratory QC 2.3 GHz NASA/Ames Research SGI / Pleiades - SGI Altix 3 USA 51200 487.0 80% 2.09 233 Center/NAS ICE 8200EX IBM / eServer Blue Gene 4 DOE/NNSA/LLNL USA 212992 478.2 80% 2.32 205 Solution DOE/Argonne National 5 IBM / Blue Gene/P Solution USA 163840 450.3 81% 1.26 357 Laboratory NSF/Texas Advanced Sun / Ranger - SunBlade 6 Computing USA 62976 75% 2.0 217 433.2 x6420 Center/Univ. of Texas 7 DOE/NERSC/LBNL Cray / Franklin - Cray XT4 USA 38642 266.3 75% 1.15 232 DOE/Oak Ridge 8 Cray / Jaguar - Cray XT4 USA 30976 205.0 79% 1.58 130 National Laboratory DOE/NNSA/Sandia 9 Cray / Red Storm - XT3/4 USA 38208 204.2 72% 2.5 81 National Laboratories Shanghai Dawning 5000A, Windows 10 China 30720 180.6 77% - - Supercomputer Center HPC 2008

  16. 32 nd List: The TOP10 Rmax Rmax/ Power Rank Site Computer Country Cores [MW] MF/W [Tflops] Rpeak IBM / Roadrunner - 1 DOE/NNSA/LANL USA 129600 1105.0 76% 2.48 445 BladeCenter QS22/LS21 DOE/Oak Ridge Cray / Jaguar - Cray XT5 2 USA 150152 1059.0 77% 6.95 152 National Laboratory QC 2.3 GHz NASA/Ames Research SGI / Pleiades - SGI Altix 3 USA 51200 487.0 80% 2.09 233 Center/NAS ICE 8200EX IBM / eServer Blue Gene 4 DOE/NNSA/LLNL USA 212992 478.2 80% 2.32 205 Solution DOE/Argonne National 5 IBM / Blue Gene/P Solution USA 163840 450.3 81% 1.26 357 Laboratory NSF/Texas Advanced Sun / Ranger - SunBlade 6 Computing USA 62976 75% 2.0 217 433.2 x6420 Center/Univ. of Texas 7 DOE/NERSC/LBNL Cray / Franklin - Cray XT4 USA 38642 266.3 75% 1.15 232 DOE/Oak Ridge 8 Cray / Jaguar - Cray XT4 USA 30976 205.0 79% 1.58 130 National Laboratory DOE/NNSA/Sandia 9 Cray / Red Storm - XT3/4 USA 38208 204.2 72% 2.5 81 National Laboratories Shanghai Dawning 5000A, Windows 10 China 30720 180.6 77% - - Supercomputer Center HPC 2008

  17. LANL Roadrunner A Petascale System in 2008 ≈ 13,000 Cell HPC chips “Connected Unit” cluster ≈ 1.33 PetaFlo Flop/s /s (from Cell) 192 Opteron nodes (180 w/ 2 dual-Cell blades ≈ 7,000 dual -core Opterons connected w/ 4 PCIe x8 ≈ 122,000 cores links) 17 clusters 2 nd stage InfiniBand 4x DDR interconnect Cell chip for each core (18 sets of 12 links to 8 switches) 2 nd stage InfiniBand interconnect (8 switches) Based on the 100 Gflop/s /s (DP) ) Cell chip Hybrid Design (2 kinds of chips & 3 kinds of cores) Programming required at 3 levels. Dual Core Opteron Chip

  18. ORNL’s Newest System Jaguar XT5 The systems will be combined after Jaguar Total XT5 XT4 acceptance of the new Peak Performance 1,645 1,382 263 XT5 upgrade. Each system will be linked AMD Opteron Cores 181,504 150,17 31,328 to the file system 6 through 4x-DDR Infiniband System Memory (TB) 362 300 62 Disk Bandwidth (GB/s) 284 240 44 Disk Space (TB) 10,750 10,000 750 Interconnect Bandwidth (TB/s) 532 374 157 Offic ice of Scienc ience

  19. ’s HPC System  University of Tennessee’s  National Institute for Computational Sciences  Housed at ORNL  Operated for the NSF  Named Kraken Today:  Cray XT5 (608 TF) + Cray XT4 (167 TF)  XT5: 16,512 sockets, 66,048 cores  XT4: 4,512 sockets, 18,048 cores  Number 15 on the Top500 19

  20. Power is an Industry Wide Problem Google facilities  leveraging hydroelectric power  old aluminum plants “Hiding in Plain Sight, Google Seeks More Power”, by John Markoff, June 14, 2006 Micros osoft oft and Yahoo oo are buildi ding ng big data center ers Micro rosoft soft Quincy, y, Wash. upstr tream eam in Wenatchee ee and Quincy, , Wash. – To keep up with Google, which means they need cheap 470,00 000 Sq Ft, 47MW! 20 electricity and readily accessible data networking

  21. ORNL/UTK Power Cost Projections 2007-2011 Over the next 5 years ORNL/UTK will deploy 2 large Petascale systems Using 4 MW today, going to 15MW before year end By 2012 could be using more than 50MW!! Cost estimates based on $0.07 per KwH Includes both DOE and NSF systems.

  22. Something’s Happening Here… From K. Olukotun, L. Hammond, H. • In the “old Sutter, and B. Smith days” it was: each year A hardwar dware e issue ue just beca came me a processors software ware probl oblem em would become faster • Today the clock speed is fixed or getting slower Things are still • doubling every 18 -24 months Moore’s Law • reinterpretated. Number of cores  double every 18-24 months 07 22

  23. Power Cost of Frequency ge 2 x Freq er ∝ Vol • Pow ower olta tage eque uenc ncy (V (V 2 F) F) • Frequency ∝ Vol olta tage ge er ∝ Fr • Po Power Freq eque uenc ncy 3 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend