The Grid, NetSolve, and Its Applications 11-13 February 2002 Jack - - PDF document

the grid netsolve and its applications
SMART_READER_LITE
LIVE PREVIEW

The Grid, NetSolve, and Its Applications 11-13 February 2002 Jack - - PDF document

The Grid, NetSolve, and Its Applications 11-13 February 2002 Jack Dongarra Computer Science Department University of Tennessee 1 Outline Overview of High Performance Computing The Grid NetSolve 2 1 Moores Law Super


slide-1
SLIDE 1

1

1

Jack Dongarra Computer Science Department University of Tennessee

The Grid, NetSolve, and Its Applications

11-13 February 2002

2

Outline

Overview of High Performance Computing The Grid NetSolve

slide-2
SLIDE 2

2

3 2005 2010 ASCI White Pacific EDSAC 1 UNIVAC 1 IBM 7090 CDC 6600 IBM 360/195 CDC 7600 Cray 1 Cray X-MP Cray 2 TMC CM-2 TMC CM-5 Cray T3D ASCI Red

1950 1960 1970 1980 1990 2000 2010 1 KFlop/s 1 MFlop/s 1 GFlop/s 1 TFlop/s 1 PFlop/s

Scalar Super Scalar Vector Parallel Super Scalar/Vector/Parallel

Moore’s Law

  • H. Mauer, H. Simon, E. Strohmaier, & JD
  • H. Mauer, H. Simon, E. Strohmaier, & JD
  • Listing of the 500 most powerful

Computers in the World

  • Yardstick: Rmax from LINPACK MPP

Ax=b, dense problem

  • Updated twice a year

SC‘xy in the States in November Meeting in Mannheim, Germany in June

  • All data available from www.top500.org

Size Rate

TPP perf ormance

slide-3
SLIDE 3

3

In 1980 a computation that took 1 full year to complete can now be done in ~ 10 hours!

Fastest Computer Over Time

10 20 30 40 50 60 70 1990 1992 1994 1996 1998 2000 Year

GFlop/s

XY ( Scat t er ) 1

Cray Y-MP (8) TMC CM-2 (2048) Fujitsu VP-2600

Fastest Computer Over Time

TMC CM-2 (2048) Fujitsu VP-2600 Cray Y-MP (8)

100 200 300 400 500 600 700 1990 1992 1994 1996 1998 2000 Year

GFlop/s

X Y ( S c a t t e r ) 1

Hitachi CP- PACS (2040) Intel Paragon (6788) Fujitsu VPP-500 (140) TMC CM-5 (1024) NEC SX-3 (4)

In 1980 a computation that took 1 full year to complete can now be done in ~ 16 minutes!

slide-4
SLIDE 4

4

Fastest Computer Over Time

Hitachi CP-PACS (2040) Intel Paragon (6788) Fujitsu VPP-500 (140) TMC CM-5 (1024) NEC SX-3 (4) TMC CM-2 (2048) Fujitsu VP-2600 Cray Y-MP (8)

1000 2000 3000 4000 5000 6000 7000 1990 1992 1994 1996 1998 2000 Year

GFlop/s

X Y ( S c a t t e r ) 1

ASCI White Pacific (7424) Intel ASCI Red Xeon (9632) SGI ASCI Blue Mountain (5040) Intel ASCI Red (9152) ASCI Blue Pacific SST (5808)

In 1980 a computation that took 1 full year to complete can today be done in ~ 27 seconds!

8

Performance Development

134 TF/s 1.167 TF/s 59.7 GF/s 7.23 TF/s 0.4 GF/s 94 GF/s J u n

  • 9

3 N

  • v
  • 9

3 J u n

  • 9

4 N

  • v
  • 9

4 J u n

  • 9

5 N

  • v
  • 9

5 J u n

  • 9

6 N

  • v
  • 9

6 J u n

  • 9

7 N

  • v
  • 9

7 J u n

  • 9

8 N

  • v
  • 9

8 J u n

  • 9

9 N

  • v
  • 9

9 J u n

  • N
  • v
  • J

u n

  • 1

N

  • v
  • 1

Intel XP/S140 Sandia Fujitsu 'NWT' NAL SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White LLNL

N=1 N=500 SUM

IBM SP 232 procs Chase Manhattan NY

1 Gflop/s 1 Tflop/s 100 Mflop/s 100 Gflop/s 100 Tflop/s 10 Gflop/s 10 Tflop/s 1 Pflop/s

Sun HPC 10000 Merril Lynch IBM 604e 69 proc A&P

My Laptop [60G - 400 M][7.2 Tflop/s 94Gflop/s], Schwab #24, 1/2 per year, 394 > 100 Gf, faster than Moore’s law, all parallel

slide-5
SLIDE 5

5

9

Performance Development

0.1 1 10 100 1000 10000 100000 1000000 J u n

  • 9

3 J u n

  • 9

4 J u n

  • 9

5 J u n

  • 9

6 J u n

  • 9

7 J u n

  • 9

8 J u n

  • 9

9 J u n

  • J

u n

  • 1

J u n

  • 2

J u n

  • 3

J u n

  • 4

J u n

  • 5

J u n

  • 6

J u n

  • 7

J u n

  • 8

J u n

  • 9

Performance [GFlop/s]

N=1 N=500 Sum 1 TFlop/s 1 PFlop/s

ASCI Earth Simulator Entry 1 T 2005 and 1 P 2010

My Laptop 10

Distributed and Parallel Systems

Distributed systems hetero- geneous Massively parallel systems homo- geneous

Grid based Computing

B e

  • w

u l f c l u s t e r N e t w

  • r

k

  • f

w s

Clusters w/ special interconnect

E n t r

  • p

i a ASCI Tflops

  • Gather (unused) resources
  • Steal cycles
  • System SW manages resources
  • System SW adds value
  • 10% - 20% overhead is OK
  • Resources drive applications
  • Time to completion is not critical
  • Time-shared
  • SETI@home
  • ~ 400,000 machines
  • Averaging 27 Tflop/s
  • Bounded set of resources
  • Apps grow to consume all cycles
  • Application manages resources
  • System SW gets in the way
  • 5% overhead is maximum
  • Apps drive purchase of equipment
  • Real-time constraints
  • Space-shared
  • ASCI White LLNL
  • 8000 processors
  • Averaging 7.2 Tflop/s

S E T I @ h

  • m

e P a r a l l e l D i s t m e m

slide-6
SLIDE 6

6

11

What is Grid Computing?

Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations

QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture.

IMAGING INSTRUMENTS COMPUTATIONAL RESOURCES LARGE-SCALE DATABASES DATA ACQUISITION ,ANALYSIS ADVANCED VISUALIZATION

12

The Computational Grid is…

…a distributed control infrastructure that

allows applications to treat compute cycles as commodities.

Power Grid analogy

Power producers: machines, software, networks,

storage systems

Power consumers: user applications

Applications draw power from the Grid the way

appliances draw electricity from the power utility.

Seamless High-performance Ubiquitous Dependable

slide-7
SLIDE 7

7

13

Computational Grids and Electric Power Grids

Why the

Computational Grid is like the Electric Power Grid

Electric power is

ubiquitous

Don’t need to know the

source of the power (transformer, generator) or the power company that serves it

Why the

Computational Grid is different from the Electric Power Grid

Wider spectrum of

performance

Wider spectrum of

services

Access governed by

more complicated issues

» Security » Performance » Socio-political factors

14

An Emerging Grid Community

1995-2000

“Grid book” gave a

comprehensive view of the state of the art

Important infrastructure and

middleware efforts initiated » Globus » Legion » Condor » NetSolve, Ninf » Storage Resource Broker » Network Weather Service » AppLeS, …

slide-8
SLIDE 8

8

15

IPG NAS-NASA http://nas.nasa.gov/~wej/home/IPG Globus http://www.globus.org/ Legion http://www.cs.virgina.edu/~grimshaw/ AppLeS http://www-cse.ucsd.edu/groups/hpcl/apples NetSolve http://www.cs.utk.edu/netsolve/ NINF http://phase.etl.go.jp/ninf/ Condor http://www.cs.wisc.edu/condor/ CUMULVS http://www.epm.ornl.gov/cs/cumulvs.html WebFlow http://www.npac.syr.edu/users/gcf/ LoCI http://loci.cs.utk.edu/

Grids are Hot

16

The Grid

slide-9
SLIDE 9

9

17

The Grid Architecture Picture

Resource Layer

High speed networks and routers

Computers Data bases Online instruments Service Layers User Portals Authentication Co- Scheduling Naming & Files Events Grid Access & Info Problem Solving Environments Application Science Portals Resource Discovery & Allocation Fault Tolerance Software 18

Globus Grid Services

The Globus toolkit provides a range of basic Grid

services

Security, information, fault detection, communication,

resource management, ...

These services are simple and orthogonal

Can be used independently, mix and match Programming model independent

For each there are well-defined APIs Standards are used extensively

E.g., LDAP, GSS-API, X.509, ...

You don’t program in Globus, it’s a set of tools

like Unix

slide-10
SLIDE 10

10

19

Widespread interest from government in

developing computational Grid platforms

Broad Acceptance of Grids as a Critical Platform for Computing

NSF’s Cyberinfrastructure NASA’s Information Power Grid DOE’s Science Grid

20

Broad Acceptance of Grids as a Critical Platform for Computing

Widespread interest from industry in

developing computational Grid platforms

IBM, Sun, Entropia, Avaki, Platform, …

On August 2, 2001, IBM announced a new corporate initiative to support and exploit Grid computing. AP reported that IBM was investing $4 billion into building 50 computer server farms around the world.

AVAKI

slide-11
SLIDE 11

11

21

Grids Form the Basis of a National Information Infrastructure

TeraGrid will provide in aggregate

  • 13. 6 trillion calculations per second
  • Over 600 trillion bytes of immediately accessible data
  • 40 gigabit per second network speed
  • Provide a new paradigm f or data- oriented computing
  • Crit ical f or disast er response, genomics, environment al modeling, et c.

August 9, 2001: NSF Awarded $53,000,000 to SDSC/NPACI and NCSA/Alliance for TeraGrid

22

Motivation for NetSolve

Client-Server Design Non-hierarchical system Load Balancing and Fault Tolerance Heterogeneous Environment Supported Multiple and simple client interfaces Built on standard components

Basics

Design an easy-t o-use t ool t o provide ef f icient and uniform access t o a variet y of scient if ic packages on UNIX and Window’s plat forms

slide-12
SLIDE 12

12

23

NetSolve Network Enabled Server

NetSolve is an example of a Grid based

hardware/software server.

Based on a Remote Procedure Call model but

with …

resource discovery, dynamic problem solving

capabilities, load balancing, fault tolerance asynchronicity, security, …

Easy-of-use paramount Other examples are NEOS from Argonne and

NINF Japan.

24

NetSolve

Target not computer scientist, but domain

scientist

Hide logistical details

User shouldn’t have to worry about how or where (issues

about reproducibility)

Present the set of available remote resources as

a “multi-purpose” machine with a wealth of scientific software

slide-13
SLIDE 13

13

NetSolve: The Big Picture

AGENT(s)

A C

S1 S2 S3 S4

Client

Matlab Mathematica C, Fortran Java, Excel Schedule Database

No knowledge of the grid required, RPC like.

IBP Depot

NetSolve: The Big Picture

AGENT(s)

A C

S1 S2 S3 S4

Client

Matlab Mathematica C, Fortran Java, Excel Schedule Database

No knowledge of the grid required, RPC like. A, B

IBP Depot

slide-14
SLIDE 14

14

NetSolve: The Big Picture

AGENT(s)

A C

S1 S2 S3 S4

Client

Matlab Mathematica C, Fortran Java, Excel Schedule Database

No knowledge of the grid required, RPC like.

Handle back

IBP Depot

NetSolve: The Big Picture

AGENT(s)

A C

S1 S2 S3 S4

Client A n s w e r ( C )

S2 ! Request

Op(C, A, B)

Matlab Mathematica C, Fortran Java, Excel Schedule Database

No knowledge of the grid required, RPC like. A, B OP, handle

IBP Depot

slide-15
SLIDE 15

15

29

Basic Usage Scenarios

Grid based numerical library

routines

User doesn’t have to have software

library on their machine, LAPACK, SuperLU, ScaLAPACK, PETSc, AZTEC, ARPACK

Task farming applications “Pleasantly parallel” execution eg Parameter studies Remote application execution Complete applications with user

specifying input parameters and receiving output “Blue Collar” Grid Based

Computing

Does not require deep

knowledge of network programming

Level of expressiveness right

for many users

User can set things up, no

“su” required

In use today, up to 200

servers in 9 countries

Can plug into Globus, Condor,

NINF, …

30

NetSolve Agent

Name server for the NetSolve

system.

Information Service

client users and administrators can query the

hardware and software services available.

Resource scheduler

maintains both static and dynamic information

regarding the NetSolve server components to use for the allocation of resources Agent

slide-16
SLIDE 16

16

31

NetSolve Agent

Resource Scheduling (cont’d):

CPU Performance (LINPACK). Network bandwidth, latency. Server workload. Problem size/algorithm complexity. Calculates a “Time to Compute.” for each appropriate

server.

Notifies client of most appropriate server.

Agent

32

Function Based Interface. Client program embeds call

from NetSolve’s API to access additional resources.

Interface available to C, Fortran,

Matlab, and Mathematica.

Opaque networking interactions. NetSolve can be invoked using a variety

  • f methods: blocking, non-blocking, task

farms, …

NetSolve Client

Client

slide-17
SLIDE 17

17

33

NetSolve Client

Intuitive and easy to use. Matlab Matrix multiply e.g.: A = matmul(B, C);

A = netsolve(‘matmul’, B, C);

  • Possible parallelisms hidden.

Client

34

NetSolve Client

i.

Client makes request to agent.

ii.

Agent returns list of servers.

  • iii. Client tries first one to

solve problem.

Client

slide-18
SLIDE 18

18

35

Generating New Services in NetSolve

Add additional functionality

Describe the interface Generate wrapper Install into server

Java GUI NetSolve Parser/ Compiler

@PROBLEM degsv @DESCRIPTION This is a linear solver for dense matrices from the LAPACK

  • Library. Solves Ax=b.

@INPUT 2 @OBJECT MATRIX DOUBLE A Double precision matrix @OBJECT VECTOR DOUBLE b Right hand side @OUTPUT 1 @OBJECT VECTOR DOUBLE x …

Server

Service Service Service Service New Service

New Service Added!

36

Task Farming - Multiple Requests To Single Problem

A Solution:

Many calls to netslnb( ); /* non-blocking */

Farming Solution:

Single call to netsl_farm( );

Request iterates over an “array of input

parameters.”

Adaptive scheduling algorithm. Useful for parameter sweeping, and

independently parallel applications.

slide-19
SLIDE 19

19

37

Data Persistence

Chain together a sequence of NetSolve

requests.

Analyze parameters to determine data

  • dependencies. Essentially a DAG is created

where nodes represent computational modules and arcs represent data flow.

Transmit superset of all input/output

parameters and make persistent near server(s) for duration of sequence execution.

Schedule individual request modules for

execution.

38

netsl(“command1”, A, B, C); netsl(“command2”, A, C, D); netsl(“command3”, D, E, F); Client Server

command1(A, B) result C

Client Server

command2(A, C) result D

Client Server

command3(D, E) result F

netsl_begin_sequence( ); netsl(“command1”, A, B, C); netsl(“command2”, A, C, D); netsl(“command3”, D, E, F); netsl_end_sequence(C, D); Client Server

sequence(A, B, E)

Server Client Server

result F input A, intermediate output C intermediate output D, input E

Data Persistence (cont’d)

slide-20
SLIDE 20

20

39

University of Tennessee Deployment: Scalable Intracampus Research Grid SInRG

  • Federated Ownership: CS, Chem

Eng., Medical School, Computational Ecology, El. Eng.

  • Real applications,

middleware development, logistical networking

The Knoxville Campus has two DS-3 commodity Internet connections and one DS-3 Internet2/Abilene connection. An OC-3 ATM link routes IP traffic between the Knoxville campus, National Transportation Research Center, and Oak Ridge National Laboratory. UT participates in several national networking initiatives including Internet2 (I2), Abilene, the federal Next Generation Internet (NGI) initiative, Southern Universities Research Association (SURA) Regional Information Infrastructure (RII), and Southern Crossroads (SoX). The UT campus consists of a meshed ATM OC-12 being migrated over to switched Gigabit by early 2002.

  • UCSD (F. Berman, H. Casanova, M. Ellisman), Salk Institute (T.

Bartol), CMU (J. Stiles), UTK (Dongarra, M. Miller, R. Wolski)

  • Study how neurotransmitters diffuse and activate receptors in synapses
  • blue unbounded, red singly bounded, green doubly bounded closed,

yellow doubly bounded open

NPACI Alpha Project - MCell: 3-D Monte-Carlo Simulation of Neuro- Transmitter Release in Between Cells

slide-21
SLIDE 21

21

41 Integrated Parallel Accurate Reservoir Simulator. Mary Wheeler’s group, UT-Austin Reservoir and Environmental Simulation. models black oil, waterflood, compositions 3D transient flow of multiple phase Integrates Existing Simulators. Framework simplified development Provides solvers, handling for wells, table lookup. Provides pre/postprocessor, visualization. Full IPARS access without Installation. IPARS Interfaces: C, FORTRAN, Matlab, Mathematica, and Web. Web Server NetSolve Client

IPARS-enabled Servers

Web Interface

SCIRun torso defibrillator application – Chris Johnson, U of Utah

Netsolve and SCIRun

slide-22
SLIDE 22

22

43

NetSolve

C Fortran

Globus proxy NetSolve proxy Ninf proxy Condor proxy

Grid middleware

Resource Discovery System Management Resource Scheduling Fault Tolerance

NetSolve: A Plug into the Grid

44

NetSolve: A Plug into the Grid

NetSolve

C Fortran Globus NetSolve servers Ninf servers NetSolve servers Condor NetSolve servers

Globus proxy NetSolve proxy Ninf proxy Condor proxy

Grid back-ends Grid middleware

Resource Discovery System Management Resource Scheduling Fault Tolerance

slide-23
SLIDE 23

23

45

NetSolve

C Fortran Matlab Mathematica Custom Globus NetSolve servers Ninf servers NetSolve servers Condor NetSolve servers

Globus proxy NetSolve proxy Ninf proxy Condor proxy

PSE front-ends Grid back-ends SCIRun Grid middleware

Remote procedure call Resource Discovery System Management Resource Scheduling Fault Tolerance

NetSolve: A Plug into the Grid

46

Things Not Touched On

Security Using Kerberos V5 for authentication. Separate Server Characteristics Implementing Hardware and Software servers Hierarchy of Argents More scalable configuration Monitor NetSolve Network Track and monitor usage Network status Network Weather Service Internet Backplane Protocol middleware for managing and using remote storage. Fault Tolerance Local / Global Configurations Dynamic Nature of Servers Automated Adaptive Algorithm Selection Dynamic determine the nest algorithm based on system

status and nature of user problem

slide-24
SLIDE 24

24

47

Conclusion

Exciting time to be in scientific

computing

Network computing is here The Grid offers tremendous

  • pportunities for collaboration

Important to develop algorithms and

software that will work effectively in this environment

48

Contributors to These Ideas

Top500 Erich Strohmaier, NERSC Horst Simon, NERSC Hans Meuer, Mannheim U NetSolve Henri Casanova, UCSD Michelle Miller, UTK Sathish Vadhiyar, UTK

  • Fran Berman, UCSD/SDSC

For additional information see… www.netlib.org/top500/ icl.cs.utk.edu/netsolve/ www.cs.utk.edu/~dongarra/ Many opportunities within the group at Tennessee