An Overview of High Performance Computing and Trends - - PDF document

an overview of high performance computing and trends
SMART_READER_LITE
LIVE PREVIEW

An Overview of High Performance Computing and Trends - - PDF document

An Overview of High Performance Computing and Trends


slide-1
SLIDE 1
  • An Overview of High Performance

Computing and Trends

  • Outline for the Next 3 Days

!"

!#

$#

slide-2
SLIDE 2
  • Innovative Computing Laboratory

%! !" "# & '!

### ! #()!* +++ ,-&*& ./.& ,0#*$#### 1*22 ,34

&"!"5(6.# %72'22

slide-3
SLIDE 3
  • Computational Science
  • #*

'8/ / #8

!/ # !#*

!!

##

"!!!

!

#8

8

Why Turn to Simulation?

9"!

+++

!8 .!! '8 #+ !.9

6#!

"!

3#/2! !4

"!!

!#! 3!4

slide-4
SLIDE 4
  • Technology Trends:

Microprocessor Capacity

2X transistors/Chip Every 1.5 years

Called “Moore’s Law”

Moore’s Law

Microprocessors have become smaller, denser, and more powerful. Not just processors, bandwidth, storage, etc

Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months.

! 2005 2010 ASCI White Pacific EDSAC 1 UNIVAC 1 IBM 7090 CDC 6600 IBM 360/195 CDC 7600 Cray 1 Cray X-MP Cray 2 TMC CM-2 TMC CM-5 Cray T3D ASCI Red

1950 1960 1970 1980 1990 2000 2010 1 KFlop/s 1 MFlop/s 1 GFlop/s 1 TFlop/s 1 PFlop/s

  • Moore’s Law
slide-5
SLIDE 5
  • H. Meuer, H. Simon, E. Strohmaier, & JD
  • H. Meuer, H. Simon, E. Strohmaier, & JD
  • Listing of the 500 most powerful

Computers in the World

  • Yardstick: Rmax from LINPACK MPP

Ax=b, dense problem

  • Updated twice a year

SC‘xy in the States in November Meeting in Mannheim, Germany in June

  • All data available from www.top500.org

"# "# "# "#

  • $$%&'

$$%&' $$%&' $$%&'

,:1),!!! "#;,)<

Fastest Computer Over Time

10 20 30 40 50 60 70 1990 1992 1994 1996 1998 2000 Year

GFlop/s

X Y (S c a tte r) 1

Cray Y-MP (8) TMC CM-2 (2048) Fujitsu VP-2600

slide-6
SLIDE 6

(

Fastest Computer Over Time

TMC CM-2 (2048) Fujitsu VP-2600 Cray Y-MP (8)

100 200 300 400 500 600 700 1990 1992 1994 1996 1998 2000 Year

GFlop/s

X Y (S c a tte r) 1

Hitachi CP- PACS (2040) Intel Paragon (6788) Fujitsu VPP-500 (140) TMC CM-5 (1024) NEC SX-3 (4)

,:1),!!! "#;,-<

Fastest Computer Over Time

Hitachi CP-PACS (2040) Intel Paragon (6788) Fujitsu VPP-500 (140) TMC CM-5 (1024) NEC SX-3 (4) TMC CM-2 (2048) Fujitsu VP-2600 Cray Y-MP (8)

1000 2000 3000 4000 5000 6000 7000 1990 1992 1994 1996 1998 2000 Year

GFlop/s

X Y (S c a tte r) 1

ASCI White Pacific (7424) Intel ASCI Red Xeon (9632) SGI ASCI Blue Mountain (5040) Intel ASCI Red (9152) ASCI Blue Pacific SST (5808)

,:1),!!! #"#;=>#<

slide-7
SLIDE 7
  • 10TF ASCI White

512 Nighthawk 16-way SMP nodes

  • 12. TF peak performance

4.0 TB memory 159 TB disk 2x I/O size and delivered bw over SST 2.5x external network improvement Sufficient swap for GANG scheduling

Livermore National Laboratory – IBM Blue Pacific and White SMP Superclusters

4TF Blue Pacific SST running

3 x 480 4-way SMP nodes 3.9 TF peak performance 2.6 TB memory 2.5 Tb/s bisectional bandwidth 62 TB disk 6.4 GB/s delivered I/O bandwidth

,:1),!!! #"#;0+(#<

Fastest Computer Over Time

Hitachi CP-PACS (2040) Intel Paragon (6788) Fujitsu VPP-500 (140) TMC CM-5 (1024) NEC SX-3 (4) TMC CM-2 (2048) Fujitsu VP-2600 Cray Y-MP (8)

10 20 30 40 50 60 70 1990 1992 1994 1996 1998 2000 Year

TFlop/s

XY (Scatter) 1

2002

Intel ASCI Red (9152) ASCI White Pacific (7424) Intel ASCI Red Xeon (9632) ASCI Blue Mountain (5040)

Japanese Earth Simulator NEC 5104

slide-8
SLIDE 8

!

  • Number 1

Number 2 Donna Crawford Director of Computing LLNL Tetsuya Satoh Director-General Earth Simulator Center

(

TOP500 list TOP500 list -

  • Data shown

Data shown

  • Manufacturer

Manufacturer or vendor

  • Computer Type

indicated by manufacturer or vendor

  • Installation Site

Customer

  • Location

Location and country

  • Year

Year of installation/last major update

  • Customer Segment

Academic,Research,Industry,Vendor,Class.

  • # Processors

Number of processors

  • Rmax

Maxmimal LINPACK performance achieved

  • Rpeak

Theoretical peak performance

  • Nmax

Problemsize for achieving Rmax

  • N1/2

Problemsize for achieving half of Rmax

slide-9
SLIDE 9

)

  • TOP10

Rank Manufacturer Computer Rmax [TF/s] Installation Site Country Year Area of Installation # Proc

1 NEC Earth-Simulator 35.86 Earth Simulator Center Japan 2002 Research 5120 2 IBM ASCI White SP Power3 7.23 Lawrence Livermore National Laboratory USA 2000 Research 8192 3 HP AlphaServer SC ES45 1 GHz 4.46 Pittsburgh Supercomputing Center USA 2001 Academic 3016 4 HP AlphaServer SC ES45 1 GHz 3.98 Commissariat a l’Energie Atomique (CEA) France 2001 Research 2560 5 IBM SP Power3 375 MHz 3.05 NERSC/LBNL USA 2001 Research 3328 6 HP AlphaServer SC ES45 1 GHz 2.92 Los Alamos National Laboratory USA 2002 Research 2048 7 Intel ASCI Red 2.38 Sandia National Laboratory USA 1999 Research 9632 8 IBM pSeries 690 1.3 GHz 2.31 Oak Ridge National Laboratory USA 2002 Research 864 9 IBM ASCI Blue Pacific SST, IBM SP 604e 2.14 Lawrence Livermore National Laboratory USA 1999 Research 5808 10 IBM pSeries 690 1.3 Ghz 2.00 IBM/US Army Reseach Lab (ARL) USA 2002 Vendor 768 !

TOP500 - Performance

1.17 TF/s 220 TF/s 35.8 TF/s 59.7 GF/s 134 GF/s 0.4 GF/s

J u n

  • 9

3 N

  • v
  • 9

3 J u n

  • 9

4 N

  • v
  • 9

4 J u n

  • 9

5 N

  • v
  • 9

5 J u n

  • 9

6 N

  • v
  • 9

6 J u n

  • 9

7 N

  • v
  • 9

7 J u n

  • 9

8 N

  • v
  • 9

8 J u n

  • 9

9 N

  • v
  • 9

9 J u n

  • N
  • v
  • J

u n

  • 1

N

  • v
  • 1

J u n

  • 2

Fujitsu 'NWT' NAL NEC Earth Simulator Intel ASCI Red Sandia IBM ASCI White LLNL

N=1 N=500 SUM

1 Gflop/s 1 Tflop/s 100 Mflop/s 100 Gflop/s 100 Tflop/s 10 Gflop/s 10 Tflop/s 1 Pflop/s

✂✁☎✄✝✆✟✞✝✠ ✡✝✞
slide-10
SLIDE 10

*

)

“Moore’s Wall” –

Horst Simon, NERSC

6?#

8!

#"!

,1

!!!#/!

!2!

8##!!

!

,:1),!!

!"#<

9"#8! 76?

@!!A

!!!"!!

#

##!!"+

“Moore’s Law”

*

Performance Extrapolation

ASCI Purple

Earth Simulator J u n

  • 9

3 J u n

  • 9

4 J u n

  • 9

5 J u n

  • 9

6 J u n

  • 9

7 J u n

  • 9

8 J u n

  • 9

9 J u n

  • J

u n

  • 1

J u n

  • 2

J u n

  • 3

J u n

  • 4

J u n

  • 5

J u n

  • 6

J u n

  • 7

J u n

  • 8

J u n

  • 9

J u n

  • 1

N=1 N=500 Sum

1 GFlop/s 1 TFlop/s 1 PFlop/s 100 MFlop/s 100 GFlop/s 100 TFlop/s 10 GFlop/s 10 TFlop/s 10 PFlop/s

✂✁☎✄✝✆✟✞✝✠ ✡✝✞
slide-11
SLIDE 11
  • Manufacturers

Cray SGI IBM Sun HP TMC Intel Fujitsu NEC Hitachi

  • thers

100 200 300 400 500

J u n

  • 9

3 N

  • v
  • 9

3 J u n

  • 9

4 N

  • v
  • 9

4 J u n

  • 9

5 N

  • v
  • 9

5 J u n

  • 9

6 N

  • v
  • 9

6 J u n

  • 9

7 N

  • v
  • 9

7 J u n

  • 9

8 N

  • v
  • 9

8 J u n

  • 9

9 N

  • v
  • 9

9 J u n

  • N
  • v
  • J

u n

  • 1

N

  • v
  • 1

J u n

  • 2

HP 168, IBM 164

  • Manufacturers

Cray SGI IBM Sun HP TMC Intel Fujitsu NEC Hitachi

  • thers

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Jun-93 Nov-93 Jun-94 Nov-94 Jun-95 Nov-95 Jun-96 Nov-96 Jun-97 Nov-97 Jun-98 Nov-98 Jun-99 Nov-99 Jun-00 Nov-00 Jun-01 Nov-01 Jun-02

Performance

IBM 33%, HP 22%, NEC 19%

slide-12
SLIDE 12
  • Sun Systems on the Top500
  • French Top500 Computers
slide-13
SLIDE 13
  • Continents

USA/Canada Europe Japan

  • thers

100 200 300 400 500 J u n

  • 9

3 J u n

  • 9

4 J u n

  • 9

5 J u n

  • 9

6 J u n

  • 9

7 J u n

  • 9

8 J u n

  • 9

9 J u n

  • J

u n

  • 1

J u n

  • 2

US 238 (242), Europe 171 (162), Japan 53 (56)

(

Continents - Performance

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Jun-93 Nov-93 Jun-94 Nov-94 Jun-95 Nov-95 Jun-96 Nov-96 Jun-97 Nov-97 Jun-98 Nov-98 Jun-99 Nov-99 Jun-00 Nov-00 Jun-01 Nov-01 Jun-02

P erfo rm an ce

USA/Canada

  • thers

Japan Europe

US 45% (59) Europe 24% (22) Japan 25% (13)

slide-14
SLIDE 14
  • Europe - Countries

Germany UK France Scandinavia Benelux Switzerland

  • thers

50 100 150

J u n

  • 9

3 N

  • v
  • 9

3 J u n

  • 9

4 N

  • v
  • 9

4 J u n

  • 9

5 N

  • v
  • 9

5 J u n

  • 9

6 N

  • v
  • 9

6 J u n

  • 9

7 N

  • v
  • 9

7 J u n

  • 9

8 N

  • v
  • 9

8 J u n

  • 9

9 N

  • v
  • 9

9 J u n

  • N
  • v
  • J

u n

  • 1

N

  • v
  • 1

J u n

  • 2

G 64, UK 37, F 23, SK 12, BEL 14, CH 3

!

Kflops per Inhabitant

450 358 245 207 203 158 141 67 643

100 200 300 400 500 600 700

J a p a n U S A G e r m a n y S c a n d i n a v i a U K F r a n c e S w i t z e r l a n d I t a l y L u x e m b

  • u

r g

✂✁ ✄ ✂✁ ✄ ✂✁ ✄ ✂✁ ✄ ☎ ✆ ☎ ✆ ☎ ✆ ☎ ✆

Japan 57 Tf/s US 99 Tf/s

slide-15
SLIDE 15
  • )

Customer Type

Research Industry Academic Classified Vendor 100 200 300 400 500

J u n

  • 9

3 N

  • v
  • 9

3 J u n

  • 9

4 N

  • v
  • 9

4 J u n

  • 9

5 N

  • v
  • 9

5 J u n

  • 9

6 N

  • v
  • 9

6 J u n

  • 9

7 N

  • v
  • 9

7 J u n

  • 9

8 N

  • v
  • 9

8 J u n

  • 9

9 N

  • v
  • 9

9 J u n

  • N
  • v
  • J

u n

  • 1

N

  • v
  • 1

J u n

  • 2

*

Industrial Customer Segments

Engineering Commercial Unknown 50 100 150 200 250

J u n

  • 9

3 N

  • v
  • 9

3 J u n

  • 9

4 N

  • v
  • 9

4 J u n

  • 9

5 N

  • v
  • 9

5 J u n

  • 9

6 N

  • v
  • 9

6 J u n

  • 9

7 N

  • v
  • 9

7 J u n

  • 9

8 N

  • v
  • 9

8 J u n

  • 9

9 N

  • v
  • 9

9 J u n

  • N
  • v
  • J

u n

  • 1

N

  • v
  • 1

J u n

  • 2
slide-16
SLIDE 16

(

  • Excerpt from TOP500

Rank Manufacturer Computer Rmax [GF/s] Installation Site Country Area # Proc

… … … … … … 40 IBM SP Power3 795 Charles Schwab USA Finance 768 66 IBM SP Power3 594 Sprint PCS USA Telecom 320 67 IBM SP Power4 555 EDS General Motors USA Automotive 224 73 IBM SP Power3 546 State Farm USA Database 520 125 IBM Netfinity P3 Ethernet Cluster 366 WesternGeco UK Geophysics 1280 127 Hewlett-Packard SuperDome HyperPlex 361 Centrica Plc UK Energy 196 … … … … … … …

  • Customer Types - Performance

Research Industry Academic Classified Vendor

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% J u n

  • 9

3 J u n

  • 9

4 J u n

  • 9

5 J u n

  • 9

6 J u n

  • 9

7 J u n

  • 9

8 J u n

  • 9

9 J u n

  • J

u n

  • 1

J u n

  • 2

Performance

slide-17
SLIDE 17
  • Producers

USA Japan Europe

100 200 300 400 500 J u n

  • 9

3 N

  • v
  • 9

3 J u n

  • 9

4 N

  • v
  • 9

4 J u n

  • 9

5 N

  • v
  • 9

5 J u n

  • 9

6 N

  • v
  • 9

6 J u n

  • 9

7 N

  • v
  • 9

7 J u n

  • 9

8 N

  • v
  • 9

8 J u n

  • 9

9 N

  • v
  • 9

9 J u n

  • N
  • v
  • J

u n

  • 1

N

  • v
  • 1

J u n

  • 2
  • Producers - Performance

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

J u n

  • 9

3 N

  • v
  • 9

3 J u n

  • 9

4 N

  • v
  • 9

4 J u n

  • 9

5 N

  • v
  • 9

5 J u n

  • 9

6 N

  • v
  • 9

6 J u n

  • 9

7 N

  • v
  • 9

7 J u n

  • 9

8 N

  • v
  • 9

8 J u n

  • 9

9 N

  • v
  • 9

9 J u n

  • N
  • v
  • J

u n

  • 1

N

  • v
  • 1

J u n

  • 2

Performance USA Japan Europe

slide-18
SLIDE 18

!

  • Processor Type

Scalar Vector SIMD 100 200 300 400 500

J u n

  • 9

3 N

  • v
  • 9

3 J u n

  • 9

4 N

  • v
  • 9

4 J u n

  • 9

5 N

  • v
  • 9

5 J u n

  • 9

6 N

  • v
  • 9

6 J u n

  • 9

7 N

  • v
  • 9

7 J u n

  • 9

8 N

  • v
  • 9

8 J u n

  • 9

9 N

  • v
  • 9

9 J u n

  • N
  • v
  • J

u n

  • 1

N

  • v
  • 1

J u n

  • 2

(

Chip Technology

CMOS/

  • ff the shelf

CMOS/ proprietary ECL

100 200 300 400 500 J u n

  • 9

3 N

  • v
  • 9

3 J u n

  • 9

4 N

  • v
  • 9

4 J u n

  • 9

5 N

  • v
  • 9

5 J u n

  • 9

6 N

  • v
  • 9

6 J u n

  • 9

7 N

  • v
  • 9

7 J u n

  • 9

8 N

  • v
  • 9

8 J u n

  • 9

9 N

  • v
  • 9

9 J u n

  • N
  • v
  • J

u n

  • 1

N

  • v
  • 1

J u n

  • 2
slide-19
SLIDE 19

)

  • Chip Technology

Alpha Power HP intel MIPS Sparc

  • ther COTS

proprietary

100 200 300 400 500 J u n 9 3 N

  • v

9 3 J u n 9 4 N

  • v

9 4 J u n 9 5 N

  • v

9 5 J u n 9 6 N

  • v

9 6 J u n 9 7 N

  • v

9 7 J u n 9 8 N

  • v

9 8 J u n 9 9 N

  • v

9 9 J u n N

  • v

J u n 1 N

  • v

1 J u n 2

!

Architectures

Single Processor

SMP MPP SIMD

Constellation

Cluster - NOW

100 200 300 400 500 J u n

  • 9

3 N

  • v
  • 9

3 J u n

  • 9

4 N

  • v
  • 9

4 J u n

  • 9

5 N

  • v
  • 9

5 J u n

  • 9

6 N

  • v
  • 9

6 J u n

  • 9

7 N

  • v
  • 9

7 J u n

  • 9

8 N

  • v
  • 9

8 J u n

  • 9

9 N

  • v
  • 9

9 J u n

  • N
  • v
  • J

u n

  • 1

N

  • v
  • 1

J u n

  • 2

Y-MP C90 Sun HPC Paragon CM5 T3D T3E SP2 Cluster of Sun HPC ASCI Red CM2 VP500 SX3

Constellation: # of p/n n

slide-20
SLIDE 20

*

)

Performance Distribution

June 2002

10 20 30 40 1 101 201 301 401 Rank P e r f

  • r

m a n c e [ T F lo p /s ]

½ life

*

Performance Distribution

June 2002

0.5 1 1.5 2 1 101 201 301 401 Rank P e r f

  • r

m a n c e [ T F lo p /s ]

½ life

slide-21
SLIDE 21
  • Cumulative Performance

June 2002

50 100 150 200 250 1 10 100 1000 Rank Performance [TFlops]

222 TF/s

✂✁✄✁ ✂✁✄✁ ✂✁✄✁ ✂✁✄✁
  • Cumulative Performance

June 2002

50 100 150 200 250 1 10 100 1000 Rank Performance [TFlops]

222 TF/s

✂✁✄✁ ✂✁✄✁ ✂✁✄✁ ✂✁✄✁

58=Rank of ½ cumulative performance

slide-22
SLIDE 22
  • Performance Distribution

Rank of 1/2 TOP500 Performance 10 20 30 40 50 60 70 80 J u n

  • 9

3 J u n

  • 9

4 J u n

  • 9

5 J u n

  • 9

6 J u n

  • 9

7 J u n

  • 9

8 J u n

  • 9

9 J u n

  • J

u n

  • 1

J u n

  • 2
slide-23
SLIDE 23
  • To Run Benchmark for TOP500
  • *

#!9!22 B !+++#.! %#! 6 C D !!"!#

!E

(

NUMBER OF MACHINES

Performance

PERFORMANCE

1976: The Supercomputing “Island”

Today: A Continuum

slide-24
SLIDE 24
  • Petaflop Computers Within the Next

Decade

7"#*

!! (+1$E21)))#2.,- //364# &#"! #! !#!2&#!/7!8/ F3&7F4!G"#!!/ ##364 !/## !++$& 'H ! !E

#"#"

'I ;=-!.

!

SETI@home: Global Distributed Computing

&0))2)))2;,)))J

  • (1021=,J

#G!

!

" " &#

!

slide-25
SLIDE 25
  • )

SETI@home

#/

#!

  • 8!

!!+

#!!#

" &# !2&

9#!

"# !!#!# K))!"# !+

!!

"' 2"# # + #"#

H 8

;())2))) =>!. #

+

Grid Computing - from ET toAnthrax

slide-26
SLIDE 26

(

  • Petaflops (1015 flop/s) Computer Today?

=$E33,):4.4

,.=6!!3,)-4 ;5=B23,)K4

5,C

,))6 0))2)))9#!<< !#

  • High-Performance Computing

Directions: Beowulf-class PC Clusters

%# 2 62 !2

26

%. %

  • '262

$2 6

8 82C 6

  • 62D6

C/ /!! //! D#!"! !"! &#!

Definition: Advantages:

Enabled by PC hardware, networks and operating system achieving capabilities of scientific workstations at a fraction of the cost and availability of industry standard message passing libraries. However, much more of a contact sport.

slide-27
SLIDE 27
  • Excerpt from TOP500

Rank Manufacturer Computer Rmax [GF/s] Installation Site Country # Proc

… … … … … … 30 Self-made Cplant/Ross 707 Sandia National Lab USA 1369 34 IBM Titan Cluster Itanium 800 MHz 594 NCSA USA 320 39 NEC Magi Cluster PIII 933 MHz 654 CBRC – Tsukuba Advanced Computing Center Japan 1024 40 Self-made SCoreIII PIII 933 MHz 618 Real World Computing, Tsukuba Japan 1024 41 IBM Netfinity Cluster PIII 1 GHz 594 NCSA USA 1024 320 Dell PowerEdge Cluster Windows2000 121 Cornell Theory Center USA 252 … … … … … … …

  • !
  • "!
  • #
  • $%&'(
  • $)*)
  • +,%
  • !+,
  • .
  • .!/.0+
  • +12.10

3

  • 4153 6
  • 4153 6
  • 5#
  • +577+
  • Performance Numbers on RISC Processors
slide-28
SLIDE 28

!

  • Pentium 4 - SSE2

Today’s “Sweet Spot” in Price/Performance

=+0K$E2())6E"2,-B,G

=0-B=2!=+0K $!.2

6'8=3'=4 ,(( !#6'''#"!!

  • ("!=L30+)-$!.4

K="!(L3,)+,=$!.4 6,=1/" #+ !?!# #"##'=

(

slide-29
SLIDE 29

)

  • NOW - Cluster

10 20 30 40 50 60 70 80 J u n 9 7 N

  • v

9 7 J u n 9 8 N

  • v

9 8 J u n 9 9 N

  • v

9 9 J u n N

  • v

J u n 1 N

  • v

1 J u n 2 AMD Intel IBM Netfinity Alpha HP Alpha Server Sparc

!

slide-30
SLIDE 30

*

) *..!+0))+ C!!! (*

Notes on the Earth Simulator

slide-31
SLIDE 31
  • (

Development Center Japan Atomic Energy Research Institute

High resolution global models

predictions of global warming etc

High resolution regional models

predictions of El Niño events and Asian monsoon etc.,

High resolution local models

predictions of weather disasters such as typhoons, localized torrential downpour, oil spill, downburst etc.

Atmospheric and

  • ceanographic science

Global dynamic model Simulation of earthquake generation process Solid earth science Regional model

to describe crust/mantle activity in the Japanese Archipelago region,

Earth Earth Simulator Simulator

to describe the entire solid earth as a system.

Seismic wave tomography

(

Earth Simulator

C#%'L2-()#2

#131$!. 42=!2,-$C#+

!0,)(!2()7!. 2#,)C

+

!"3,1))!"!4

1K2)))"!2,-$C. "##+

>))C# ,+-C M(2K!

slide-32
SLIDE 32
  • (

Earth Simulator in a Nutshell

Processor node #0 #1 #15 … … … Cluster #0 #1 #39 … … MT archives HDD ………… Shared memory Vector processor Vector processor Vector processor … #7 #0 #1 Shared memory Vector processor Vector processor Vector processor … #7 #0 #1 Shared memory Vector processor Vector processor Vector processor … #7 #0 #1 Tape library

Specifications Peak performance / processor Peak performance / node Shared memory 8 Gflops 64 Gflops 16 GB Total number of processors Total number of nodes Total peak performance Total main memory 5,120 640 40 Tflops 10 TB

Interconnection network (16GB/s * 2)

( Earth Simulator Research and Development Center

  • Architecture : A MIMD-type, distributed memory, parallel system

consisting of computing nodes in which vector-type multi- processors are tightly connected by sharing main memory.

  • Total number of processor nodes: 640
  • Number of PE’s for each node: 8
  • Total number of PE’s: 5120
  • Peak performance of each PE: 8 GFLOPS
  • Peak performance of each node: 64 GFLOPS
  • Main memory

: 10 TB (total). Shared memory / node : 16 GB

  • Interconnection network: Single-Stage Crossbar Network
  • Performance : Assuming the efficiency 12.5%,

the peak performance 40 TFLOPS (the effective performance for an atmospheric circulation model is more than 5 TFLOPS).

Outline of the Earth Simulator Computer Outline of the Earth Simulator Computer

slide-33
SLIDE 33
  • (

386 mm 457 mm

457mm 386mm

SX4 8 Gflops (2 Gflop/s x 4) Clock :125MHz LSI: 0.35

m CMOS

37x4=148 LSIs SX4 8 Gflops (2 Gflop/s x 4) Clock :125MHz LSI: 0.35

m CMOS

37x4=148 LSIs SX5 8 Gflop/s Clock :250MHz LSI: 0.25

m CMOS

32 LSIs SX5 8 Gflop/s Clock :250MHz LSI: 0.25

m CMOS

32 LSIs Earth Simulator 8 Gflop/s Clock :500MHz/1GHz LSI: 0.15

m CMOS

1 chip processor Earth Simulator 8 Gflop/s Clock :500MHz/1GHz LSI: 0.15

m CMOS

1 chip processor

Earth Simulator Research and Development Center

225mm 225mm 115mm 110mm

Comparison of vector processors R&D results R&D results

((

about 6m Peak Performance : 64 Gflops Main Memory : 16GB Electric Power : 8KVA Peak Performance : 64 Gflops Main Memory : 16GB Electric Power : 90KVA about 0.7m about 1m about 7m Present distributed-memory supercomputer Earth Simulator 1 node (SX-4) 1 node

Comparison of cabinets for 1 node

Air Cooling Air Cooling

R&D results R&D results

Earth Simulator Research and Development Center

slide-34
SLIDE 34
  • (

Earth Simulator Research and Development Center

R&D Issues on Hardware Technologies

  • Build-up PCB (110mm x 115mm)

Line width / Spacing : 25

m / 25 m

6 core layers + 4 build-up layers on both surfaces

  • number of pins/chip <1000 (present) 4000 - 5000

(2) Packaging Technology

  • Air cooling using heat pipe technology (Max. 170W per chip)

(3) Cooling Technology

  • Enhancement of clock cycle 150MHz 500MHz (partly 1GHz)
  • Development of high density LSI

0.15

m CMOS + Cu interconnection (8 layers)

1.50-2.0 million transistors/cm2 10 million transistors/cm2

  • Enlargement of chip size (about 2cm ×2cm)

(1) LSI Technology

R&D results R&D results

(4) Board to Board Interconnection Technology

  • Interface connector 0.5mm pitch surface mount
  • Interface cable 0.6mm diameter coaxial cable and 3.8ns/m delay time

(5) PN-IN Interconnection Technology

  • 40m transmission distance with fine tuned equalizer circuit

High performance one-chip vector processor: OCVP-ES

(! Earth Simulator Research and Development Center

640 PNs

PN #2 PN #3 PN #4 PN #5 PN #636 PN #637 PN #638 PN #639 PN #0 PN #1

320 Cabinets

XSW #0 XSW #1 XSW #2 XSW #3 XSW #4 XSW #5 XSW #6 XSW #7 XSW #126 XSW #127 XCT #0

128 XSWs

64 Cabinets

XCT #1

Total number of cables : 640 x 130 = 83,200 Total length of cables : 2,900 m Total weight of cables : 220t

R&D results R&D results

Connection between processor nodes (crossbar network)

slide-35
SLIDE 35
  • ()

Bird’s-eye View of the Earth Simulator System 65m 50m

Double Floor for IN Cables Interconnection Network (IN) Cabinets Cartridge Tape Library System Power Supply System Air Conditioning System Processor Node (PN) Cabinets Disks *

Cross-sectional View of the Earth Simulator Building

Air-conditioning system Double floor for IN cables and air-conditioning Air-conditioning return duct Lightning protection system Power supply system Air-conditioning system Seismic isolation system

slide-36
SLIDE 36

(

  • New Earth Simulator Facilities

Building for operation and research Building for computer system Power plant

  • Wiring of interconnection network cables

Earth Simulator Research and Development Center Earth Simulator Research and Development Center

slide-37
SLIDE 37
  • Cables
  • Earth Simulator Research and Development Center

3,000 km Total length of IN cables

R&D results R&D results

slide-38
SLIDE 38

!

  • Wiring of interconnection network cables

Earth Simulator Research and Development Center Earth Simulator Research and Development Center (

Processor Cabinets

slide-39
SLIDE 39

)

Earth Earth Simulator Research and Development Center Simulator Research and Development Center

Panoramic view of the Earth Simulator System January, 2002

!

Peak Performance

slide-40
SLIDE 40

*

)

Earth Simulator Computer (ESC)

  • &8% B6C

CMK0+-7!. "!EM,2)(,2=,-N31+>C4 !3O4#OM=-02()1 C0+1+ !*.! *76

  • 70))

P !!'M=>+07!. ';,.-P30))4 'QP3,=4 'QP3,04 'Q !!'#

3K>+=7!.4

'QQK%7?

3>+07!.4

'I ;=>7!.

"# "# "# "#

  • $$%&'

$$%&' $$%&' $$%&'

!*

Machine at the Top of the List

31920 140

236 124.5

Fujitsu NWT 1993 128600 6768

1.4 338 2.3 281.1

Intel Paragon XP/S MP 1994 128600 6768

1.0 338 1 281.1

Intel Paragon XP/S MP 1995 103680 2048

1.8 614 1.3 368.2

Hitachi CP-PACS 1996 235000 9152

3.0 1830 3.6 1338

Intel ASCI Option Red (200 MHz Pentium Pro) 1997 431344 5808

2.1 3868 1.6 2144

ASCI Blue-Pacific SST, IBM SP 604E 1998 362880 9632

0.8 3207 1.1 2379

ASCI Red Intel Pentium II Xeon core 1999 430000 7424

3.5 11136 2.1 4938

ASCI White-Pacific, IBM SP Power 3 2000 518096 7424

1.0 11136 1.5 7226

ASCI White-Pacific, IBM SP Power 3 2001 1041216 5104

3.7 40832 4.9 35610

Earth Simulator Computer, NEC 2002 Size of Problem Number of Processors

Factor

  • from

Pervious Year Theoretical Peak Gflop/s Factor

  • from

Pervious Year Measured Gflop/s

Computer Year

slide-41
SLIDE 41
  • !

LINPACK Benchmark List

cea lanl psc esc llnl psc lbnl snl llnl snl lanl u toyko snl lanl noo snl

  • saka

leibniz us government

leibniz

lbnl ibm

!

Performance of AFES Climate Code

slide-42
SLIDE 42
  • !

Cumulous convection Condensation, precipitation, convection

  • Simplified Arakawa-Schubert

(Arakawa and Schubert, 1974;Moorthi & Suarez, 1992)

  • Kuo scheme + shallow convection
  • Manabe’s moist convection

Large-scale condensation Other cloud processes and prediction of cloud water (Le Treut & Li, 1990) Radiation 2-stream k-distribution scheme (Nakajima&Tanaka, 1986) Vertical diffusion Transport of heat, momentum, and moisture in PBL Level 2 turbulence scheme (Mellor & Yamada, 1974,1982) Surface flux Fluxes in surface boundary layer (Louis, 1979) (Mellor et al., 1992) Ground process Multi-layer heat conduction, Hydrology (Manabe, 1979) Ground moisture (Manabe et al., 1965) Frozen soil process (Clapp & Hornberger, 1978) Bucket model (Kondo, 1993)

  • Ocean

mixing layer Ocean temperature (Wilson et al, 1987) Sea ice Gravity wave- induced drag Orographic effect (McFarlane, 1987) Others Dry convection adjustment

  • !

Parallelization of AFES

63/#4//Q# #+++!##

3+!%+!4

+++"7

#

6 3C/4//Q# !!!#!!#/!8E!

!

!!! » D!## » !/3=/#!4! DE 3C/4

//Q,'

E! 68E!!/!!!

slide-43
SLIDE 43
  • !

Optimization Strategies for AFES Climate Model

Grid points: 3840*1920*96

J=1920 I=3840

✂✁☎✄✝✆✝✞ ✟ ✟ ✟ ✂✁☎✞✝✆ ✂✁☎✞✝✄ ✂✁☎✞☎✠

K = 9 6 J=1920

✂✁☎✄✝✆✝✞ ✟ ✟ ✟ ✂✁☎✞✝✆ ✂✁☎✞✝✄ ✂✁☎✞☎✠

K = 9 6 FFT Inversed FFT

P N 6 4 P N 6 4

Parallel decomposition

Grid space Spectral space High resolution (10km) resulting in increased cost concentration

  • n vector-tailored dynamics part (>75%)

MPI among nodes / Microtasking within node Domain decomposition that fully exploits parallel nodes (>99%

parallelization ratio) with less communication

Reduced load imbalance due to improved algorithms

(e.g., Use of increasingly popular Kuo cloud physics model)

Improved vector performance with DO-loop optimization Combined use of assembler coding for part of matrix operations

!(

Strategy for Performance Enhancement for the ES

6E!

6#!

!

##

#

'"#!!!

slide-44
SLIDE 44
  • !

6,)

  • Total CPUs

Nodes CPUs Elapse time /Node ( sec ) Peak Sustained Ratio(%) 80 80 1 238.04 0.64 0.52 81.1 160 160 1 119.26 1.28 1.04 81.0 320 320 1 60.52 2.56 2.04 79.8 640 80 8 32.06 5.12 3.86 75.3 1280 160 8 16.24 10.24 7.61 74.3 2560 320 8 8.52 20.48 14.50 70.8 TFLOPS

Simulator T1279L96 (3840x1920x96, 10.4km)

26.6 TFLOP/S sustained performance with the 640 full nodes (5120 CPUs/ Peak 40 TFLOP/S)

!!

Effective Performance of the AFES Climate Code on the ES with the Kuo's Cumulus Convection Scheme for a T1279L96 Resolution Model

  • 64.6%

74.3% 75.3% 26.6 TFLOP/S sustained performance with the 640 full nodes (5120 CPUs/ peak 40 TFLOP/S) 70.8%

slide-45
SLIDE 45
  • !)

Results from AFES

)*

Precipitaion(312km,T42L24)

slide-46
SLIDE 46

(

)

Precipitation(125km,T106L24)

)

Precipitation (20.8km,T639L24)

slide-47
SLIDE 47
  • )

Precipitation (10.4km,T1279L24)

)

Specific Humidity at 850hPa (about 1500 m a.s.l.)

slide-48
SLIDE 48

!

)

Cyclones around the Madagascar Islands

  • Specific humidity

)(

Seasonal Variation of Sea Surface Temperature

10 resolution for oceans (previously 100km )

slide-49
SLIDE 49

)

)

Distributed and Parallel Systems

Distributed systems hetero- geneous Massively parallel systems homo- geneous

G r i d b a s e d C

  • m

p u t i n g

Beowulf cluster Network of ws

C l u s t e r s w / s p e c i a l i n t e r c

  • n

n e c t

Entropia ASCI Tflops

  • $3#4
  • !!
  • 9
  • 9##!
  • ,)R/ =)R#B
  • &#!
  • !!
  • /#
  • 'I
  • ;())2)))
  • =>!.
  • C##
  • !!!
  • !
  • 9
  • 0R#8
  • #S
  • &!/
  • /#
  • 9%
  • 1)))
  • >+=!.

SETI@home Parallel Dist mem

)!

NUMBER OF MACHINES

Performance

PERFORMANCE

1976: The Supercomputing “Island”

Today: A Continuum

slide-50
SLIDE 50

*

))

The Future of HPC

$8

  • 8"#"

!#""

#!T!T !!""!"!

!+++

+++"#! 7!6!"###

#

J#U#U

U

**

Highly Parallel Supercomputing: Where Are We?

* ##!!##!

+

!2##!!

8#!+C+++

!!!!+ #!* ##E#2"!2/!!!72D6

#6!"!+C+++

!7!+ 6###

#"+

#!H"!

"+

slide-51
SLIDE 51
  • *

Highly Parallel Supercomputing: Where Are We?

*

&"#!"!+ %!

!E+C+++

&!"!!!#!

+

."*

%& #2 2+#

"!!#.+C+++

."!+

*

The Importance of Standards - Software

96#+++ C+++/##! !!!!##+++ +++#!3TT4 +++##"!3

#"4

%##8*36262D62G742+++

  • +++!!G

+++T#T!!!#

G!!!#!##

slide-52
SLIDE 52
  • *

The Importance of Standards - Hardware

#& "##2!!! #/##3 627" !22

74

$##!!* "#V"!

  • !!!

..!.!".! ##

*

Achieving TeraFlops

,::,#2,$!. #2,)))!#

8!!!! 22 6? ! "!/#!

slide-53
SLIDE 53
  • *

Future: Petaflops ( fl pt ops/s)

!,#

  • !

,+

7!# #!! !G !

1015

##"

!#

!#

  • !!

!"

!#

#!

Today flops for our workstations ≈ 1015

*(

A Petaflops Computer System

,!.# C,)2)))#,2)))2))) C,)C#,C ."##22+ "!#25()C#,

9+

6""!#@#"!A"

=),)