SSI : Overview of Simulation Sof tware I nf rastructure f or Large Scale Scientif ic Applications
Akira Nishida
Depart ment of Comput er Science, Universit y of Tokyo J ST CREST 98t h I PSJ SI GHPC Meet ing
SSI : Overview of Simulation Sof tware I nf rastructure f or Large - - PowerPoint PPT Presentation
SSI : Overview of Simulation Sof tware I nf rastructure f or Large Scale Scientif ic Applications Akira Nishida Depart ment of Comput er Science, Universit y of Tokyo J ST CREST 98 t h I PSJ SI GHPC Meet ing Motivation Emergence of large
Depart ment of Comput er Science, Universit y of Tokyo J ST CREST 98t h I PSJ SI GHPC Meet ing
vect or supercomput ers) in 1980s
Development in US
since 1990s (also mirrored in J apan)
1970
1979
during 1987-1995
Mat rix Solvers) since 1996
st art ed in 2001 by DoE (Development of hardware/ sof t ware inf rast ruct ure f or t erascale comput ing)
Program Syst em f or St at ist iacal Analysis wit h Least -Squares Fit t ing (T. Nakagawa and Y. Oyanagi et al., 1976-1982)
A series of books by K. Murat a, T. Oguni, H. Hasegawa published f rom Maruzen Co.,Lt d. wit h f loppy disks
libraries
dist ribut ed memory archit ect ures
parallel archit ect ures
st ruct ured net works
Linear solvers Tsukuba Univ. J apan Met eorological Agency Eart h Simulat or Cent er AI ST Grid Research Cent er Et c. I nst it ut e f or Solid St at e Physics I nst it ut e of Medical Science Et c. I nst it ut e of I ndust rial Science RI ST Advancesof t Corp. (MEXT I T Program) Fast int egral t ransf orms Eigensolvers I mplement at ion met hods Computing and networking environment
Tut orials I mplement at ion and verif icat ion Programming model Algorit hms Survey of hardware t echnologies Survey of sof t ware engineering Survey of Applicat ions Facilit ies 2007 (7 mont hs) 2006 2005 2004 2003 2002 (5 mont hs) Fiscal Year
libraries, while designing t hem growing more complex
(I nt el Madison 1.3GHz × 32, Linux OS. 32GB main memory)
x 16, GbE int erconnect
( Cisco C6509)
covered
t he SSI environment by t he developers
SGI Altix 3700 NEC SX-6i GbE or 10GbE LAN I nf iniBand I nt erconnect ed I t anium3 Clust er HyperTransport I nt erconnect ed Opt eron Clust er Sun Fire 3800 Sun St orEdge T3 To GbE (→10GbE) WAN To Deskt ops Cisco Rout er C6509
Shared memory comput er SGI Alt ix 3700
Memory bandwidt h perf ormance compared wit h Sun
10000 20000 30000 40000 50000 60000 10 20 30 40 50 60 70 memory bandwidth (MB/s) number of threads Copy Scale Add Triad 10000 20000 30000 40000 50000 60000 5 10 15 20 25 30 memory bandwidth (MB/s) number of threads Copy Scale Add Triad
comput ing environment
Fort ran 77” published by Maruzen Co.,Lt d. by Hasegawa et al.
environment .
ht t p:/ / ssi.is.s.u-t okyo.ac.j p/ has been opened
Eigensolvers (CG Type)
mat rices Ax = λBx
Bx = μAx, μ = 1/ λ
μ(x) =xTBx / xTAx using t hat t he most ascending direct ion is ∇μ(x) ≡ g(x) =2(Bx -μAx) / xTAx by solving conj ugat e gradient met hod wit h t he above coef f icient as αi xi+1 = xi + αipi, pi = - gi + βi-
1pi- 1, βi- 1 =
gTi gi / gTi-
1gi- 1
Eigensolvers
and Argent at i (2003) (See Figures)
AM G- PCG Precondioner (1,445 sec.)
R es idu al
5 100 150 200 250 30 10N
R es idu al
Linear solvers
Bi-CR t ype met hod)
Zhang, T. Sogabe, Bi-CR met hod f or solving large nonsymmet ric linear syst ems, t he 2003 I nt ernat ional Conf erence on Numerical Linear Algebra and Opt imizat ion, Oct ober 7-10, 2003. (I nvit ed Talk)
n n n n
n n
n n
n n
1
+
n n
n n
1
−
A
n
Replace CG in Bi-CG wit h more st able CR algorit hm Test ed wit h Toeplit z mat rices and some Mat rix Market
Derived CRS, BiCRSTAB, or GPBiCR which corresponds t o
Aggregat ion MG f or Anisot ropic Problems. I n Proceedings of Symposium on Advanced Comput ing Syst ems and I nf rast ruct ures, pp.137-144, 2003.
Decomposit ion. I PSJ Transact ions on Advanced Comput ing Syst ems, Vol. 44, No.SI G 6 (ACS 1), pp.9-17, 2003.
Aggregat ion MG
Tat ebe and Oyanagi
Parallelizat ion of Direct Liear Solver f or Banded Mat rices using
aggregat e
Fast int egral t ransf orms
J oint st udies wit h researchers in t he f ield of weat her f orecast
and eart h hydrodynamics
Ef f icient implement at ion of parallel FFT algorit hms in a
(mult iprocessor) node
Mult iply-add I nst ruct ions. I n Proceedings of High Perf ormance Comput ing Symposium 2004, pp.17-24.
Algorit hm on Dist ribut ed Shared Memory Archit ect ure and it s Opt imizat ion. I PSJ Transact ions on Advanced Comput ing Syst ems Vol. 44, No. SI G 6 (ACS 1), pp.1-8, 2003.
NEC AzusA)
Provide general-purpose, easy-t o-use sof t ware
Surveys of st at us and direct ions of programming
FPC and Eart h Simulat or Cent er
Technology St rat egic Core”
p t i m i z a t i
c
m u n i c a t i
c l u s t e r / g r i d e n v i r
m e n t s
Net works”, I PSJ Transact ions on Advanced Comput ing Syst ems, t o appear.
LAM)
processes)
E.g. Perf ormance signif icant ly changes when alt ering broadcast ing root wit h naïve implement at ion of binary t ree based algorit hms
Score and LAM
Perf ormance of comput ers t o keep rapid progress
Parallel simulat ion t echnology is t o be used in wider areas wit h
popularizat ion of dist ribut ed
Domest ic ef f ort f or sof t ware inf rast ruct ure f or massively
Design f or long t erm use at home and overseas Suppose t o be used by researchers working at supercomput ing
cent ers and research laborat ories as a pract ical component s
Publish of f icial manual on t he algorit hms and t heir usage Target a st andard high qualit y library
Dist ribut ion of high qualit y common component s f or scient if ic
simulat ion
Est ablishment of reliable designing/ evaluat ing met hodologies via
f eedbacks f rom users