1
1
Jack Dongarra Computer Science Department University of Tennessee
The Grid, NetSolve, and Its Applications
11-13 February 2002
2
Outline
Overview of High Performance Computing The Grid NetSolve
The Grid, NetSolve, and Its Applications 11-13 February 2002 Jack - - PDF document
The Grid, NetSolve, and Its Applications 11-13 February 2002 Jack Dongarra Computer Science Department University of Tennessee 1 Outline Overview of High Performance Computing The Grid NetSolve 2 1 Moores Law Super
1
11-13 February 2002
2
Overview of High Performance Computing The Grid NetSolve
3 2005 2010 ASCI White Pacific EDSAC 1 UNIVAC 1 IBM 7090 CDC 6600 IBM 360/195 CDC 7600 Cray 1 Cray X-MP Cray 2 TMC CM-2 TMC CM-5 Cray T3D ASCI Red
1950 1960 1970 1980 1990 2000 2010 1 KFlop/s 1 MFlop/s 1 GFlop/s 1 TFlop/s 1 PFlop/s
Scalar Super Scalar Vector Parallel Super Scalar/Vector/Parallel
Size Rate
TPP perf ormance
In 1980 a computation that took 1 full year to complete can now be done in ~ 10 hours!
Cray Y-MP (8) TMC CM-2 (2048) Fujitsu VP-2600
TMC CM-2 (2048) Fujitsu VP-2600 Cray Y-MP (8)
Hitachi CP- PACS (2040) Intel Paragon (6788) Fujitsu VPP-500 (140) TMC CM-5 (1024) NEC SX-3 (4)
In 1980 a computation that took 1 full year to complete can now be done in ~ 16 minutes!
Hitachi CP-PACS (2040) Intel Paragon (6788) Fujitsu VPP-500 (140) TMC CM-5 (1024) NEC SX-3 (4) TMC CM-2 (2048) Fujitsu VP-2600 Cray Y-MP (8)
ASCI White Pacific (7424) Intel ASCI Red Xeon (9632) SGI ASCI Blue Mountain (5040) Intel ASCI Red (9152) ASCI Blue Pacific SST (5808)
In 1980 a computation that took 1 full year to complete can today be done in ~ 27 seconds!
8
134 TF/s 1.167 TF/s 59.7 GF/s 7.23 TF/s 0.4 GF/s 94 GF/s J u n
3 N
3 J u n
4 N
4 J u n
5 N
5 J u n
6 N
6 J u n
7 N
7 J u n
8 N
8 J u n
9 N
9 J u n
u n
N
Intel XP/S140 Sandia Fujitsu 'NWT' NAL SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White LLNL
IBM SP 232 procs Chase Manhattan NY
1 Gflop/s 1 Tflop/s 100 Mflop/s 100 Gflop/s 100 Tflop/s 10 Gflop/s 10 Tflop/s 1 Pflop/s
Sun HPC 10000 Merril Lynch IBM 604e 69 proc A&P
My Laptop [60G - 400 M][7.2 Tflop/s 94Gflop/s], Schwab #24, 1/2 per year, 394 > 100 Gf, faster than Moore’s law, all parallel
9
0.1 1 10 100 1000 10000 100000 1000000 J u n
3 J u n
4 J u n
5 J u n
6 J u n
7 J u n
8 J u n
9 J u n
u n
J u n
J u n
J u n
J u n
J u n
J u n
J u n
J u n
Performance [GFlop/s]
N=1 N=500 Sum 1 TFlop/s 1 PFlop/s
ASCI Earth Simulator Entry 1 T 2005 and 1 P 2010
My Laptop 10
Distributed systems hetero- geneous Massively parallel systems homo- geneous
Grid based Computing
B e
u l f c l u s t e r N e t w
k
w s
Clusters w/ special interconnect
E n t r
i a ASCI Tflops
S E T I @ h
e P a r a l l e l D i s t m e m
11
IMAGING INSTRUMENTS COMPUTATIONAL RESOURCES LARGE-SCALE DATABASES DATA ACQUISITION ,ANALYSIS ADVANCED VISUALIZATION
12
…a distributed control infrastructure that
Power Grid analogy
Power producers: machines, software, networks,
storage systems
Power consumers: user applications
Applications draw power from the Grid the way
Seamless High-performance Ubiquitous Dependable
13
Why the
Electric power is
ubiquitous
Don’t need to know the
source of the power (transformer, generator) or the power company that serves it
Why the
Wider spectrum of
performance
Wider spectrum of
services
Access governed by
more complicated issues
» Security » Performance » Socio-political factors
14
“Grid book” gave a
comprehensive view of the state of the art
Important infrastructure and
middleware efforts initiated » Globus » Legion » Condor » NetSolve, Ninf » Storage Resource Broker » Network Weather Service » AppLeS, …
15
IPG NAS-NASA http://nas.nasa.gov/~wej/home/IPG Globus http://www.globus.org/ Legion http://www.cs.virgina.edu/~grimshaw/ AppLeS http://www-cse.ucsd.edu/groups/hpcl/apples NetSolve http://www.cs.utk.edu/netsolve/ NINF http://phase.etl.go.jp/ninf/ Condor http://www.cs.wisc.edu/condor/ CUMULVS http://www.epm.ornl.gov/cs/cumulvs.html WebFlow http://www.npac.syr.edu/users/gcf/ LoCI http://loci.cs.utk.edu/
16
17
Resource Layer
High speed networks and routers
Computers Data bases Online instruments Service Layers User Portals Authentication Co- Scheduling Naming & Files Events Grid Access & Info Problem Solving Environments Application Science Portals Resource Discovery & Allocation Fault Tolerance Software 18
The Globus toolkit provides a range of basic Grid
Security, information, fault detection, communication,
resource management, ...
These services are simple and orthogonal
Can be used independently, mix and match Programming model independent
For each there are well-defined APIs Standards are used extensively
E.g., LDAP, GSS-API, X.509, ...
You don’t program in Globus, it’s a set of tools
19
Widespread interest from government in
NSF’s Cyberinfrastructure NASA’s Information Power Grid DOE’s Science Grid
20
Widespread interest from industry in
IBM, Sun, Entropia, Avaki, Platform, …
On August 2, 2001, IBM announced a new corporate initiative to support and exploit Grid computing. AP reported that IBM was investing $4 billion into building 50 computer server farms around the world.
21
August 9, 2001: NSF Awarded $53,000,000 to SDSC/NPACI and NCSA/Alliance for TeraGrid
22
Client-Server Design Non-hierarchical system Load Balancing and Fault Tolerance Heterogeneous Environment Supported Multiple and simple client interfaces Built on standard components
Design an easy-t o-use t ool t o provide ef f icient and uniform access t o a variet y of scient if ic packages on UNIX and Window’s plat forms
23
NetSolve is an example of a Grid based
Based on a Remote Procedure Call model but
resource discovery, dynamic problem solving
Easy-of-use paramount Other examples are NEOS from Argonne and
24
Target not computer scientist, but domain
Hide logistical details
User shouldn’t have to worry about how or where (issues
about reproducibility)
Present the set of available remote resources as
AGENT(s)
S1 S2 S3 S4
Matlab Mathematica C, Fortran Java, Excel Schedule Database
IBP Depot
AGENT(s)
S1 S2 S3 S4
Matlab Mathematica C, Fortran Java, Excel Schedule Database
IBP Depot
AGENT(s)
S1 S2 S3 S4
Matlab Mathematica C, Fortran Java, Excel Schedule Database
Handle back
IBP Depot
AGENT(s)
S1 S2 S3 S4
S2 ! Request
Matlab Mathematica C, Fortran Java, Excel Schedule Database
IBP Depot
29
Grid based numerical library
routines
User doesn’t have to have software
library on their machine, LAPACK, SuperLU, ScaLAPACK, PETSc, AZTEC, ARPACK
Task farming applications “Pleasantly parallel” execution eg Parameter studies Remote application execution Complete applications with user
specifying input parameters and receiving output “Blue Collar” Grid Based
Computing
Does not require deep
knowledge of network programming
Level of expressiveness right
for many users
User can set things up, no
“su” required
In use today, up to 200
servers in 9 countries
Can plug into Globus, Condor,
NINF, …
30
Name server for the NetSolve
Information Service
client users and administrators can query the
hardware and software services available.
Resource scheduler
maintains both static and dynamic information
regarding the NetSolve server components to use for the allocation of resources Agent
31
Resource Scheduling (cont’d):
CPU Performance (LINPACK). Network bandwidth, latency. Server workload. Problem size/algorithm complexity. Calculates a “Time to Compute.” for each appropriate
server.
Notifies client of most appropriate server.
Agent
32
Function Based Interface. Client program embeds call
Interface available to C, Fortran,
Opaque networking interactions. NetSolve can be invoked using a variety
33
Intuitive and easy to use. Matlab Matrix multiply e.g.: A = matmul(B, C);
34
i.
ii.
35
Add additional functionality
Describe the interface Generate wrapper Install into server
@PROBLEM degsv @DESCRIPTION This is a linear solver for dense matrices from the LAPACK
@INPUT 2 @OBJECT MATRIX DOUBLE A Double precision matrix @OBJECT VECTOR DOUBLE b Right hand side @OUTPUT 1 @OBJECT VECTOR DOUBLE x …
Service Service Service Service New Service
36
A Solution:
Many calls to netslnb( ); /* non-blocking */
Farming Solution:
Single call to netsl_farm( );
Request iterates over an “array of input
Adaptive scheduling algorithm. Useful for parameter sweeping, and
37
Chain together a sequence of NetSolve
Analyze parameters to determine data
Transmit superset of all input/output
Schedule individual request modules for
38
command1(A, B) result C
command2(A, C) result D
command3(D, E) result F
sequence(A, B, E)
result F input A, intermediate output C intermediate output D, input E
39
Eng., Medical School, Computational Ecology, El. Eng.
middleware development, logistical networking
The Knoxville Campus has two DS-3 commodity Internet connections and one DS-3 Internet2/Abilene connection. An OC-3 ATM link routes IP traffic between the Knoxville campus, National Transportation Research Center, and Oak Ridge National Laboratory. UT participates in several national networking initiatives including Internet2 (I2), Abilene, the federal Next Generation Internet (NGI) initiative, Southern Universities Research Association (SURA) Regional Information Infrastructure (RII), and Southern Crossroads (SoX). The UT campus consists of a meshed ATM OC-12 being migrated over to switched Gigabit by early 2002.
Bartol), CMU (J. Stiles), UTK (Dongarra, M. Miller, R. Wolski)
yellow doubly bounded open
41 Integrated Parallel Accurate Reservoir Simulator. Mary Wheeler’s group, UT-Austin Reservoir and Environmental Simulation. models black oil, waterflood, compositions 3D transient flow of multiple phase Integrates Existing Simulators. Framework simplified development Provides solvers, handling for wells, table lookup. Provides pre/postprocessor, visualization. Full IPARS access without Installation. IPARS Interfaces: C, FORTRAN, Matlab, Mathematica, and Web. Web Server NetSolve Client
IPARS-enabled Servers
Web Interface
43
C Fortran
Globus proxy NetSolve proxy Ninf proxy Condor proxy
Grid middleware
Resource Discovery System Management Resource Scheduling Fault Tolerance
44
C Fortran Globus NetSolve servers Ninf servers NetSolve servers Condor NetSolve servers
Globus proxy NetSolve proxy Ninf proxy Condor proxy
Grid back-ends Grid middleware
Resource Discovery System Management Resource Scheduling Fault Tolerance
45
C Fortran Matlab Mathematica Custom Globus NetSolve servers Ninf servers NetSolve servers Condor NetSolve servers
Globus proxy NetSolve proxy Ninf proxy Condor proxy
PSE front-ends Grid back-ends SCIRun Grid middleware
Remote procedure call Resource Discovery System Management Resource Scheduling Fault Tolerance
46
Security Using Kerberos V5 for authentication. Separate Server Characteristics Implementing Hardware and Software servers Hierarchy of Argents More scalable configuration Monitor NetSolve Network Track and monitor usage Network status Network Weather Service Internet Backplane Protocol middleware for managing and using remote storage. Fault Tolerance Local / Global Configurations Dynamic Nature of Servers Automated Adaptive Algorithm Selection Dynamic determine the nest algorithm based on system
status and nature of user problem
47
Exciting time to be in scientific
Network computing is here The Grid offers tremendous
Important to develop algorithms and
48
Top500 Erich Strohmaier, NERSC Horst Simon, NERSC Hans Meuer, Mannheim U NetSolve Henri Casanova, UCSD Michelle Miller, UTK Sathish Vadhiyar, UTK