Supercomputers and Supercomputers and Clusters and Clusters and - PowerPoint PPT Presentation

Supercomputers and Supercomputers and Clusters and Clusters and Grid, Grid, Oh My! Oh My! Jack Dongarra University of Tennessee and Oak Ridge National Laboratory and SFI Walton Visitor University College Dublin 1

Apologies to Frank Baum… … Apologies to Frank Baum Dorothy: “Do you suppose we'll meet any wild animals?” Tinman: “We might.” Scarecrow: “Animals that ... that eat straw?” Tinman: “Some. But mostly lions, and tigers, and bears.” All: “Lions and tigers and bears, oh my! Supercomputers and clusters and grids, oh my! Lions and tigers and bears, oh my!” Supercomputers and clusters and grids, oh my! 2

Technology Trends: Technology Trends: Microprocessor Capacity Microprocessor Capacity Gordon Moore (co-founder of Intel) Electronics Magazine, 1965 Number of devices/chip Microprocessors have become doubles every 18 months smaller, denser, and more powerful. Not just processors, bandwidth, storage, etc. 2X memory and processor speed and 2X transistors/Chip Every ½ size, cost, & power every 18 1.5 years months. Called “Moore’s Law” 3

Moore’ ’s Law s Law Moore Super Scalar/Vector/Parallel 1 PFlop/s Earth (10 15 ) Parallel Simulator ASCI White ASCI Red Pacific 1 TFlop/s (10 12 ) TMC CM-5 Cray T3D Vector TMC CM-2 Cray 2 1 GFlop/s Cray X-MP Super Scalar (10 9 ) Cray 1 1941 1 (Floating Point operations / second, Flop/s) CDC 7600 IBM 360/195 1945 100 1 MFlop/s Scalar 1949 1,000 (1 KiloFlop/s, KFlop/s) 1951 10,000 CDC 6600 (10 6 ) 1961 100,000 1964 1,000,000 (1 MegaFlop/s, MFlop/s) IBM 7090 1968 10,000,000 1975 100,000,000 1987 1,000,000,000 (1 GigaFlop/s, GFlop/s) 1992 10,000,000,000 1993 100,000,000,000 1 KFlop/s 1997 1,000,000,000,000 (1 TeraFlop/s, TFlop/s) UNIVAC 1 (10 3 ) 2000 10,000,000,000,000 EDSAC 1 2003 35,000,000,000,000 (35 TFlop/s) 4 1950 1960 1970 1980 1990 2000 2010

H. Meuer, H. Simon, E. Strohmaier, & JD H. Meuer, H. Simon, E. Strohmaier, & JD - Listing of the 500 most powerful Computers in the World - Yardstick: Rmax from LINPACK MPP Ax=b, dense problem TPP performance Rate - Updated twice a year Size SC‘xy in the States in November Meeting in Mannheim, Germany in June - All data available from www.top500.org 5

What is a What is a Supercomputer? Supercomputer? ♦ A supercomputer is a hardware and software system that provides close to the maximum performance that can currently be achieved. ♦ Over the last 10 years the range for the Top500 has increased greater than Moore’s Law ♦ 1993: � #1 = 59.7 GFlop/s Why do we need them? � #500 = 422 MFlop/s Computational fluid dynamics, ♦ 2003: protein folding, climate modeling, � #1 = 35.8 TFlop/s national security, in particular for � #500 = 403 GFlop/s cryptanalysis and for simulating nuclear weapons to name a few. 6

A Tour de Force in Engineering A Tour de Force in Engineering Homogeneous, Centralized, ♦ Proprietary, Expensive! Target Application: CFD-Weather, ♦ Climate, Earthquakes 640 NEC SX/6 Nodes (mod) ♦ � 5120 CPUs which have vector ops � Each CPU 8 Gflop/s Peak 40 TFlop/s (peak) ♦ ~ 1/2 Billion € for machine, ♦ software, & building Footprint of 4 tennis courts ♦ 7 MWatts ♦ � Say 10 cent/KWhr - $16.8K/day = $6M/year! Expect to be on top of Top500 ♦ until 60-100 TFlop ASCI machine arrives From the Top500 (November 2003) ♦ � Performance of ESC > Σ Next Top 3 Computers 7

November 2003 November 2003 Rmax Rpeak Manufacturer Computer Installation Site Year # Proc Tflop/s Tflop/s Earth Simulator Center 1 NEC Earth-Simulator 35.8 2002 5120 40.90 Yokohama Hewlett- ASCI Q - AlphaServer SC Los Alamos National Laboratory 2 13.9 2002 8192 20.48 Packard ES45/1.25 GHz Los Alamos Virginia Tech Apple G5 Power PC 3 Self 10.3 Blacksburg, VA 2003 2200 17.60 w/Infiniband 4X PowerEdge 1750 P4 Xeon 3.6 Ghz University of Illinois U/C 4 Dell 9.82 2003 2500 15.30 w/Myrinet Urbana/Champaign Hewlett- rx2600 Itanium2 1 GHz Cluster – Pacific Northwest National Laboratory 5 8.63 2003 1936 11.62 Packard w/Quadrics Richland Opteron 2 GHz, Lawrence Livermore National Laboratory 6 Linux NetworX 8.05 2003 2816 11.26 w/Myrinet Livermore MCR Linux Cluster Xeon 2.4 GHz – Lawrence Livermore National Laboratory 7 Linux NetworX 7.63 2002 2304 11.06 w/Quadrics Livermore Lawrence Livermore National Laboratory 8 IBM ASCI White, Sp Power3 375 MHz 7.30 2000 8192 12.29 Livermore NERSC/LBNL 9 IBM SP Power3 375 MHz 16 way 7.30 2002 6656 9.984 Berkeley xSeries Cluster Xeon 2.4 GHz – Lawrence Livermore National Laboratory 10 IBM 6.59 2003 1920 9.216 w/Quadrics Livermore 8 50% of top500 performance in top 9 machines; 131 system > 1 TFlop/s; 210 machines are clusters, 1 IE Vodophone

TOP500 – – Performance Performance - - Nov 2003 Nov 2003 TOP500 (10 15 ) 1 Pflop/s 528 TF/s SUM 100 Tflop/s 35.8 TF/s NEC 10 Tflop/s ES N=1 1.17 TF/s (10 12 ) IBM ASCI White 1 Tflop/s LLNL Intel ASCI Red 59.7 GF/s Sandia 100 Gflop/s Fujitsu 403 GF/s N=500 'NWT' NAL 10 Gflop/s (10 9 ) 0.4 GF/s 1 Gflop/s My Laptop 100 Mflop/s 3 4 5 6 7 8 9 0 1 2 3 9 9 9 9 9 9 9 0 0 0 0 - - - - - - - - - - - n n n n n n n n n n n u u u u u u u u u u u J J J J J J J J J J J 9

Virginia Tech “ “Big Mac Big Mac” ” G5 Cluster G5 Cluster Virginia Tech ♦ Apple G5 Cluster � Dual 2.0 GHz IBM Power PC 970s � 16 Gflop/s per node � 2 CPUs * 2 fma units/cpu * 2 GHz * 2(mul-add)/cycle � 1100 Nodes or 2200 Processors � Theoretical peak 17.6 Tflop/s � Infiniband 4X primary fabric � Cisco Gigabit Ethernet secondary fabric � Linpack Benchmark using 2112 processors � Theoretical peak of 16.9 Tflop/s � Achieved 10.28 Tflop/s � Could be #3 on 11/03 Top500 � Cost is $5.2 million which includes the system itself, memory, storage, and communication fabrics 10

11 Customer Types Customer Types

12 – Clusters Clusters NOW – NOW

A Tool and A Market for Every Task A Tool and A Market for Every Task 200K Honda units at 5 KW to equal a 1 GW nuclear plant Capability • Each targets different applications • understand application needs 13

Taxonomy Taxonomy Cluster Computing Capability Computing ♦ Commodity processors and ♦ Special purpose processors switch and interconnect ♦ Processors design point ♦ High Bandwidth, low for web servers & home latency communication pc’s ♦ Designed for scientific ♦ Leverage millions of computing processors ♦ Relatively few machines will ♦ Price point appears be sold attractive for scientific ♦ High price computing 14

High Bandwidth vs vs Commodity Systems Commodity Systems High Bandwidth ♦ High bandwidth systems have traditionally been vector computers � Designed for scientific problems � Capability computing ♦ Commodity systems are designed for web servers and the home PC market � Used for cluster based computers leveraging price point ♦ Scientific computing needs are different � Require a better balance between data movement and floating point operations. Results in greater efficiency. Earth Simulator Cray X1 ASCI Q MCR VT Big Mac (NEC) (Cray) (HP ES45) (Dual Xeon) (Dual IBM PPC) Year of Introduction 2002 2003 2003 2002 2003 Node Architecture Vector Vector Alpha Pentium Power PC 2 GHz Processor Cycle Time 500 MHz 800 MHz 1.25 GHz 2.4 GHz 8 Gflop/s Peak Speed per Processor 8 Gflop/s 12.8 Gflop/s 2.5 Gflop/s 4.8 Gflop/s 0.8 Bytes/flop to main memory 4 3 1.28 0.9 Bytes/flop interconnect 1.5 1 0.12 0.07 0.11 15

Top 5 Machines for the Top 5 Machines for the Linpack Benchmark Linpack Benchmark Efficiency T Peak Computer Number Achieved TFlop/s (Full Precision) of Procs TFlop/s 1 Earth Simulator NEC SX-6 5120 35.9 41.0 87.5% 2 LANL ASCI Q AlphaServer EV-68 (1.25 GHz w/Quadrics) 8160 13.9 20.5 67.7% 3 VT Apple G5 dual IBM Power PC (2 GHz, 970s, w/Infiniband 4X) 2112 10.3 16.9 60.9% 4 UIUC Dell Xeon Pentium 4 (3.06 Ghz w/Myrinet) 2500 9.8 15.3 64.1% 5 PNNL HP RX2600 Itanium 2 (1.5GHz w/Quadrics) 1936 8.6 11.6 74.1% 16

Phases I - - III III Phases I Early Academia Early Metrics, Metrics and Products Software Research Pilot Benchmarks Benchmarks HPCS Tools Platforms Platforms Capability or Products Requirements Application and Metrics Analysis Performance Research Technology Assessment Prototypes Assessments System Concept & Pilot Systems Design PDR Reviews DDR Review Industry Phase II Phase III Readiness Review Readiness Reviews Fiscal Year 02 03 04 05 06 07 08 09 10 Reviews Phase III Phase I Phase II Full Scale Development Industry Procurements Industry R&D commercially ready in the 2007 to 2010 timeframe. Concept Study Critical Program 3 companies $100M ? Milestones 5 companies ~$50M each 17 $10M each

Supercomputers and Supercomputers and Clusters and Clusters and - PowerPoint PPT Presentation

Supercomputers and Supercomputers and Clusters and Clusters and Grid, Grid, Oh My! Oh My! Jack Dongarra University of Tennessee and Oak Ridge National Laboratory and SFI Walton Visitor University College Dublin 1 Apologies to Frank

I nternational research The evidence on clusters is clear Firms located in clusters are more

Internet Server Clusters Internet Server Clusters Jeff Chase Duke University, Department of

Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse Thanh-Chung Dao and

CS 5220: Parallel machines and models David Bindel 2017-09-07 1 Why clusters? Clusters of

Supercomputers and Supercomputers and Clusters and Grids, Clusters and Grids, Oh My! Oh My!

Locational narratives in creative clusters An exploration of place, reputation and creative

CS 4230: Parallel Programming Lecture 4a: HPC Clusters January 23, 2017 01/23/2017 CS4230 1

Dynamic Virtual Clusters in a Grid Dynamic Virtual Clusters in a Grid Site Manager Site Manager

The Integrative Role of COWs and Supercomputers in Research and Education Activities Don

Black-hole simulations on supercomputers U. Sperhake DAMTP , University of Cambridge DAMTP ,

Efficient Scientific Data Efficient Scientific Data Management on Supercomputers Management on

Black-hole binary simulations on supercomputers U. Sperhake CSIC-IEEC Barcelona 2 nd Iberian

LARGE SCALE VISUALIZATION ON GPU ACCELERATED SUPERCOMPUTERS Peter Messmer, 11/16/2015

Trends in High Performance Trends in High Performance Computing and Using Numerical Computing

http://cs224w.stanford.edu Better and better clusters (k), (score) Clusters get worse and

Logistics Clusters and Economic Growth Yossi Sheffi Logistics Clusters Acto de Investidura del

Reasoning with Expressive Description Logics Logical Foundations for the Semantic Web Ian

Some Applications Of Bandwidth Estimation Andrew Odlyzko http.//www.dtc.umn.edu/~odlyzko 1

Minimizing resource protection in IP over WDM networks: Multi- layer Shared Backup Router A.

Taking Saratoga from space-based ground sensors to ground-based space sensors Lloyd Wood Centre

your exercise iLab 1+2 info event online Tell your friends!

Next Generation Internet research and activities in China Yan MA Information Network Center

Software Design See Alan Cooper, The Essentials of User Interface Design who designs the

CPSC 581 Human Computer Interaction II Your Hosts Sonny Chan - MS 634 - sonny.chan@ucalgary.ca