 
              The Grid, NetSolve, and Its Applications 11-13 February 2002 Jack Dongarra Computer Science Department University of Tennessee 1 Outline � Overview of High Performance Computing � The Grid � NetSolve 2 1
Moore’s Law Super Scalar/Vector/Parallel 2010 1 PFlop/s 2005 Parallel ASCI White ASCI Red Pacific 1 TFlop/s TMC CM-5 Cray T3D Vector TMC CM-2 Cray 2 1 GFlop/s Cray X-MP Super Scalar Cray 1 CDC 7600 IBM 360/195 Scalar 1 MFlop/s CDC 6600 IBM 7090 1 KFlop/s UNIVAC 1 EDSAC 1 3 1950 1960 1970 1980 1990 2000 2010 H. Mauer, H. Simon, E. Strohmaier, & JD H. Mauer, H. Simon, E. Strohmaier, & JD - Listing of the 500 most powerful Computers in the World - Yardstick: Rmax from LINPACK MPP Ax=b, dense problem TPP perf ormance Rate - Updated twice a year Size SC‘xy in the States in November Meeting in Mannheim, Germany in June - All data available from www.top500.org 2
XY ( Scat t er ) 1 Fastest Computer Over Time 70 60 50 GFlop/s 40 30 TMC Cray CM-2 20 Y-MP (8) (2048) 10 Fujitsu VP-2600 0 1990 1992 1994 1996 1998 2000 Year In 1980 a computation that took 1 full year to complete can now be done in ~ 10 hours! X Y ( S c a t t e r ) 1 Fastest Computer Over Time 700 Hitachi 600 CP- PACS 500 GFlop/s (2040) 400 TMC CM-5 Intel 300 (1024) NEC Paragon SX-3 (6788) 200 (4) Fujitsu 100 VPP-500 TMC Fujitsu CM-2 Cray (140) VP-2600 Y-MP (8) (2048) 0 1990 1992 1994 1996 1998 2000 Year In 1980 a computation that took 1 full year to complete can now be done in ~ 16 minutes! 3
X Y ( S c a t t e r ) 1 Fastest Computer Over Time ASCI White 7000 Pacific (7424) 6000 Intel ASCI ASCI Red Xeon 5000 GFlop/s Blue (9632) Pacific 4000 SST (5808) 3000 Intel ASCI Red 2000 (9152) SGI ASCI Blue Intel Hitachi Fujitsu 1000 TMC CP-PACS Paragon NEC TMC VPP-500 Mountain CM-5 (2040) (6788) Fujitsu SX-3 Cray CM-2 (140) (1024) Y-MP (8) VP-2600 (4) (5040) (2048) 0 1990 1992 1994 1996 1998 2000 Year In 1980 a computation that took 1 full year to complete can today be done in ~ 27 seconds! Performance Development 1 Pflop/s 134 TF/s SUM 100 Tflop/s 10 Tflop/s 7.23 TF/s N=1 1.167 TF/s 1 Tflop/s IBM ASCI White Intel ASCI Red LLNL 59.7 GF/s Hitachi/Tsukuba Sandia 100 Gflop/s 94 GF/s Fujitsu CP-PACS/2048 Intel XP/S140 'NWT' NAL N=500 IBM SP Sandia IBM 604e 10 Gflop/s 232 procs Sun 69 proc Chase HPC 10000 A&P 0.4 GF/s Manhattan NY 1 Gflop/s Merril Lynch My Laptop SNI VP200EX 100 Mflop/s Uni Dresden 3 3 4 4 5 5 6 6 7 7 8 8 9 9 0 0 1 1 9 9 9 9 9 9 9 9 9 9 9 9 9 9 0 0 0 0 - - - - - - - - - - - - - - - - - - n v n v n v n v n v n v n v n v n v u o u o u o u o u o u o u o u o u o J N J N J N J N J N J N J N J N J N 8 [60G - 400 M][7.2 Tflop/s 94Gflop/s], Schwab #24, 1/2 per year, 394 > 100 Gf, faster than Moore’s law, all parallel 4
Performance Development 1 PFlop/s 1000000 ASCI 100000 Earth Simulator 10000 Sum Performance [GFlop/s] 1 TFlop/s 1000 N=1 100 10 1 My Laptop N=500 0.1 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 9 9 9 9 9 9 9 0 0 0 0 0 0 0 0 0 0 - - - - - - - - - - - - - - - - - n n n n n n n n n n n n n n n n n u u u u u u u u u u u u u u u u u J J J J J J J J J J J J J J J J J 9 Entry 1 T 2005 and 1 P 2010 Distributed and Parallel Systems m e special interconnect m s r e w ASCI Tflops t e s Distributed t s m u Massively Grid based l f D i o o c Clusters w/ systems Computing h a f k parallel l @ l e u i r p o l w l o a hetero- w I systems T o r r t t a E e n e P B N geneous S E homo- geneous Gather (unused) resources Bounded set of resources � � Steal cycles Apps grow to consume all cycles � � System SW manages resources Application manages resources � � System SW adds value System SW gets in the way � � 10% - 20% overhead is OK 5% overhead is maximum � � Resources drive applications Apps drive purchase of equipment � � Time to completion is not critical Real-time constraints � � Time-shared Space-shared � � SETI@home ASCI White LLNL � � ~ 400,000 machines 8000 processors � � 10 Averaging 27 Tflop/s Averaging 7.2 Tflop/s � � 5
What is Grid Computing? Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations DATA ADVANCED , ANALYSIS ACQUISITION VISUALIZATION QuickTime™ QuickTime™ and a and a decompressor decompressor are needed to see this picture. are needed to see this picture. COMPUTATIONAL RESOURCES IMAGING INSTRUMENTS LARGE-SCALE DATABASES 11 The Computational Grid is… � …a distributed control infrastructure that allows applications to treat compute cycles as commodities. � Power Grid analogy � Power producers: machines, software, networks, storage systems � Power consumers: user applications � Applications draw power from the Grid the way appliances draw electricity from the power utility. � Seamless � High-performance � Ubiquitous � Dependable 12 6
Computational Grids and Electric Power Grids � Why the � Why the Computational Grid is Computational Grid is like the Electric different from the Power Grid Electric Power Grid � Electric power is � Wider spectrum of ubiquitous performance � Don ’ t need to know the � Wider spectrum of source of the power services (transformer, � Access governed by generator) or the power more complicated issues company that serves it » Security » Performance » Socio-political factors 13 An Emerging Grid Community 1995-2000 � “Grid book” gave a comprehensive view of the state of the art � Important infrastructure and middleware efforts initiated » Globus » Legion » Condor » NetSolve, Ninf » Storage Resource Broker » Network Weather Service 14 » AppLeS, … 7
Grids are Hot IPG NAS-NASA http://nas.nasa.gov/~wej/home/IPG Globus http://www.globus.org/ Legion http://www.cs.virgina.edu/~grimshaw/ AppLeS http://www-cse.ucsd.edu/groups/hpcl/apples NetSolve http://www.cs.utk.edu/netsolve/ NINF http://phase.etl.go.jp/ninf/ Condor http://www.cs.wisc.edu/condor/ CUMULVS http://www.epm.ornl.gov/cs/cumulvs.html WebFlow http://www.npac.syr.edu/users/gcf/ LoCI http://loci.cs.utk.edu/ 15 The Grid 16 8
The Grid Architecture Picture User Portals Problem Solving Application Science Grid Access & Info Environments Portals Resource Discovery Service Layers Co- Scheduling Fault Tolerance & Allocation Authentication Events Naming & Files Computers Data bases Resource Layer Online instruments Software 17 High speed networks and routers Globus Grid Services � The Globus toolkit provides a range of basic Grid services � Security, information, fault detection, communication, resource management, ... � These services are simple and orthogonal � Can be used independently, mix and match � Programming model independent � For each there are well-defined APIs � Standards are used extensively � E.g., LDAP, GSS-API, X.509, ... � You don’t program in Globus, it’s a set of tools like Unix 18 9
Broad Acceptance of Grids as a Critical Platform for Computing � Widespread interest from government in developing computational Grid platforms NSF’s Cyberinfrastructure NASA’s Information Power Grid DOE’s Science Grid 19 Broad Acceptance of Grids as a Critical Platform for Computing � Widespread interest from industry in developing computational Grid platforms � IBM, Sun, Entropia, Avaki, Platform, … On August 2, 2001, IBM announced a new corporate initiative to support and exploit Grid computing. AVAKI AP reported that IBM was investing $4 billion into building 50 computer server farms around the world. 20 10
Grids Form the Basis of a National Information Infrastructure August 9, 2001: NSF Awarded $53,000,000 to SDSC/NPACI and NCSA/Alliance for TeraGrid TeraGrid will provide in aggregate • 13. 6 trillion calculations per second • Over 600 trillion bytes of immediately accessible data • 40 gigabit per second network speed • Provide a new paradigm f or data- oriented computing • Crit ical f or disast er response, genomics, environment al modeling, et c. 21 Motivation for NetSolve Design an easy-t o-use t ool t o provide ef f icient and uniform access t o a variet y of scient if ic packages on UNIX and Window’s plat forms Basics � Client-Server Design � Non-hierarchical system � Load Balancing and Fault Tolerance � Heterogeneous Environment Supported � Multiple and simple client interfaces � Built on standard components 22 11
NetSolve Network Enabled Server � NetSolve is an example of a Grid based hardware/software server. � Based on a Remote Procedure Call model but with … � resource discovery, dynamic problem solving capabilities, load balancing, fault tolerance asynchronicity, security, … � Easy-of-use paramount � Other examples are NEOS from Argonne and NINF Japan. 23 NetSolve � Target not computer scientist, but domain scientist � Hide logistical details � User shouldn’t have to worry about how or where (issues about reproducibility) � Present the set of available remote resources as a “multi-purpose” machine with a wealth of scientific software 24 12
Recommend
More recommend