' $ Institut f ur Mathematik Universit at Augsburg Ulrich - PDF document

' $ Institut f� ur Mathematik Universit � at Augsburg Ulrich R� ude ' $ E�ciency of numerical algo rithms on future high p erfo rmance sup ercomputers Ulrich R� ude Institut f� ur Mathematik Universit � at Augsburg http://scicomp.math.uni-augsburg.de/rue de/me. html DF G p roject: Datenlok ale Iterationsverfahren Ma rch 1998 � � Title F98 - 0.0 � �

' $ Institut f� ur Mathematik Universit � at Augsburg Ulrich R� ude ' $ Outline � The e�ciency pa rado x � What is wrong ab out our algo rithms and p rograms � Cache o riented iterative metho ds � High p erfo rmance computer a rchitecture � Scienti�c computing in the future � � Contents F98 - 0.1 � �

' $ Institut f� ur Mathematik Universit � at Augsburg Ulrich R� ude ' $ Ideally ... Mathematical a rguments p redict that � multigrid with nested iteration can solve scala r elliptic PDE with app ro ximately { 100 op erations { sto ring 8 reals p er unkno wn � on a w o rkstation 9 { that can do 1 � 10 op erations (= 1000 MFlop) p er second 6 { in 64 � 10 w o rds (=512 MByte) of sto rage and so, w e can solve fo r 7 10 unkno wns in ab out one second . � � What it is ab out F98 - 0.2 � �

' $ Institut f� ur Mathematik Universit � at Augsburg Ulrich R� ude ' $ In p ractice ... our p rograms 5 � can sometimes only do 10 unkno wns � on a (massively pa rallel) sup ercomputer � where it do es not run fo r hours � � F98 - 0.3 � �

' $ Institut f� ur Mathematik Universit � at Augsburg Ulrich R� ude ' $ Run time compa rison of iterative algo rithms on unifo rm grids � Standa rd Multigrid � Adaptive Multigrid � SOR � SOR with cache optimization � � What it's ab out F98 - 0.4 � �

' $ Institut f� ur Mathematik Universit � at Augsburg Ulrich R� ude ' $ E�ciency of P oisson Solvers Benchma rk suggested b y Botta et. al. in: Ho w fast the Laplace Equation w as solved in 1995. Performance of Multigrid Poisson Solver 16 Digital PWS 500 au SGI O200, 180 Mhz 14 HP 9000/755, 99 Mhz P-II/266(SDRAM) P-Pro/200 Time per unknown (Microseconds) 12 10 8 6 4 � With 1 GFlop p erfo rmance, 250 op erations 2 p er unkno wn should b e executed in 0.25 � seconds. 0 4 6 8 10 12 14 Level L (Gridsize= 2^L) � F o r small data sets w e thus have to 25% p eak p erfo rmance, fo r la rge data sets < 7% p eak p erfo rmance � � P erfo rmance Analysis of Elliptic Solvers F98 - 1.1 � �

' $ Institut f� ur Mathematik Universit � at Augsburg Ulrich R� ude ' $ E�ciency (cont'd) 6 � Benchma rk requires erro r reduction of 10 in the residual. � This is oversatis�ed b y 5 V(2,1) cycles costing 250 �oating p oint op erations p er unkno wn. � Using a V(2,1)-FMG algo rithm this could b e reduced b y a facto r 4. Compa rison with t w o b est p erfo rmers from Botta pap er Performance of Multigrid Poisson Solver 100 Digital PWS 500 au SGI O200, 180 Mhz 90 HP 9000/755, 99 Mhz P-II/266(SDRAM) P-Pro/200 80 MILU-rrb (on HP755) Time per unknown (Microseconds) NGILU (on HP755) 70 60 50 40 30 � � P erfo rmance Analysis of Elliptic Solvers F98 - 1.2 � � 20 10 0 4 6 8 10 12 14 Level L (Gridsize= 2^L)

' $ Institut f� ur Mathematik Universit � at Augsburg Ulrich R� ude ' $ Which algo rithms and data structures sp oil p erfo rmance? P erfo rm red-black relaxation on Digital Alpha PWS 500au using a � structured grid with constant co e�cients � structured grid with va riable co e�cients � unstructured grid, implemented with link ed list, but all data ideally cache aligned � unstructured grid, data non cache aligned � structured grid, constant co e�cients, opti- mized fo r cache p erfo rmance � � P erfo rmance Analysis of Elliptic Solvers F98 - 1.3 � �

' $ Institut f� ur Mathematik Universit � at Augsburg Ulrich R� ude ' $ P erfo rmance of RB Performance of Red Black Relaxation 700 Structured, const coeff Structured, const coeff, cache tuned Structured, variable coefficients 600 Unstructured, cache aligned access Unstructured, non-cache alined 500 400 MegaFlop 300 200 100 0 16 32 64 128 256 512 1024 2048 Gridsize Performance of Red Black Relaxation 180 Structured, variable coefficients Unstructured, cache aligned access 160 Unstructured, non-cache alined 140 120 100 MegaFlop 80 � 60 � P erfo rmance dep ending on vecto r length F98 - 1.4 � � 40 20 0 16 32 64 128 256 512 1024 Gridsize

' $ Institut f� ur Mathematik Universit � at Augsburg Ulrich R� ude ' $ Memo ry Hiera rchy (DEC PWS 600) CPU 32 Registers 1000 W. Lev. 1 Cache 12000 W. Level 2 Cache 0.5 MW ext. Level 3 Cache Level Capac. (MB/s) Latency 64 MW Main Memory FP Register 256 B 28,800 1.7 ns Cache 1 8 KB 19,200 1.7 ns 1 GW Disk Space Cache 2 96 KB 9,600 5.0 ns Cache 3 2 MB 873 23.3 ns Main Mem 1,536 MB 1,070 105.0 ns � � Example Architecture E97 - 2.1 � �

' $ Institut f� ur Mathematik Universit � at Augsburg Ulrich R� ude ' $ Backus (1977): I p rop ose to call this tub e the V on-Neumann b ottleneck. What a re the consequences? T o avoid ine�ciency w e must: Avoid dynamic structures. No link ed lists, bi- na ry trees, etc. on to o lo w granula rit y . Ho w to implement spa rse matrices then? W e don't. Exploit instruction-level pa rallelism. Prepa re the co des such that automatic restructuring to ols and compilers (optimizers) can extract the pa rallelism. F90 and HPF a rra y syntax a re counter-p ro ductive, since w e also need to Exploit data lo calit y . Do not p rogram in global sw eeps! W e cannot save in fo rming Ax , 2 3 but w e can save when Ax; A x; A x; : : : a re needed. This is a wkw a rd p rogramming and in the future w e will need to ols fo r this job! � � Consequences F98 - 2.2 � �

' $ Institut f� ur Mathematik Universit � at Augsburg Ulrich R� ude ' $ P AM - P atch Adaptive Multigrid � no des a re group ed in (non-overlapping) patches of �xed size � each level consists of a collection of patches � patches ma y b e p resent (live) o r � patches ma y b e virtual (ghosts) � � P AM F98 - 3.1 � �

' $ Institut f� ur Mathematik Universit � at Augsburg Ulrich R� ude ' $ Histo ry and F uture of Microp ro cesso rs T echnology data Y r T yp e Mhz � m T rans CPI MFlop 82 80286 12 1.5 0.14 M 30 0.4 85 80386 33 1.0 0.28 M 12 3.0 97 21164 625 0.35 9.30 M 0.5 1.25G 11 Int-X 10000 ? 1000.00 M ? ? Imp rovement F acto rs 82 { 97 97 { 2011 Mhz 50 16 T ransisto rs 65 100 M�op: 3000 ( � 50 � 65) ??? � � F uture High P erfo rmance Computers F98 - 4.1 � �

' $ Institut f� ur Mathematik Universit � at Augsburg Ulrich R� ude ' $ Dave P atterson (1997 in SIAM News): Instruction level pa rallelism is running out of steam. Interp retation � A microp ro cesso r to da y { is faster than the fastest sup ercomputer 15 y ea rs ago. { has the internal pa rallelism equivalent to the la rgest pa rallel p ro cesso r 15 y ea rs ago. � A microp ro cesso r in 2011 { could b e faster than the fastest sup ercomputer to da y (if w e �nd a w a y to exploit what technology will mak e p ossible) { could emplo y as much internal pa rallelism as a massively pa rallel computer to da y . � � F uture High P erfo rmance Computers F98 - 4.2 � �

' $ Institut f� ur Mathematik Universit � at Augsburg Ulrich R� ude ' $ J. Go o dman & D. Burger (1997 in a IEEE Computer edito rial): The circumstances in which computer a r- chitects will �nd themselves in the next 15 y ea rs a re truly daunting. Memo ry systems � T o sustain the p eak p erfo rmance of a 1.25 GFlop p ro cesso r to da y , the memo ry system needs a bandwidth of 30 Gb yte/sec, but t yp- ical (main) memo ry systems only deliver 1 Gb yte/sec. � A hyp othetical 1.25 TFlop p ro cesso r w ould need 30 TByte/sec memo ry bandwidth. If w e assume that this p ro cesso r will have a 4096 Bit memo ry bus, it w ould still require a bus clo ck of 60 GHz. � � F uture High P erfo rmance Computers F98 - 4.3 � �

' $ Institut f ur Mathematik Universit at Augsburg Ulrich - PDF document

' $ Institut f ur Mathematik Universit at Augsburg Ulrich R ude ' $ Eciency of numerical algo rithms on future high p erfo rmance sup ercomputers Ulrich R ude Institut f ur Mathematik Universit

Mobile & Location Based Services Anto Aasa 2018 Augsburg http://aasa.ut.ee/augsburg

Positioning technologies Anto Aasa Augsburg 2019 http://aasa.ut.ee/augsburg Location aware

Worldview & ICT Map: North is up ! http://aasa.ut.ee/augsburg 1

GRINDTEC AUGSBURG 2020 Informations regarding the GrindTec exhibition in Augsburg, from 18 th to

VR ow Virtual Reality Rowing HS-Augsburg | Interaction Engineering | WS15/16 Oliver Queck |

Location Intelligence. Privacy Augsburg 2020 Anto Aasa http://aasa.ut.ee/augsburg Location

FZM Johannes-Kepler-Forschungszentrum Fakult at f ur Mathematik f ur Mathematik,

Modelling and control of stochastic hybrid PDP systems Alfio Borz Institut fr Mathematik,

Uniqueness Christian Fleischhack Universit at Paderborn Institut f ur Mathematik

Kon-Who? Presence in 12 countries worldwide Production facilities in Augsburg, Deggendorf,

with robots Katrin Ruttmann University of Applied Sciences Augsburg Interaction Engineering

Icon Translator User Interface adaptation for two-hand use on Tablets Lina Cui Peris Njuguna

How Broadcast Data Reveals Your Identity and Social Graph Rolf Winter

Time-spatial correspondence between Pi2 wave power and UV aurora bursts V.A. Pilipenko Institute

Teaching to and graduation of underrepresented groups in Increase Diversity STEM. to reach

Geographical Information System (GIS) & LBS Augsburg 2019 Anto Aasa

An Implicational Logic for Conjecturing and Distributed Proof Attempts Lucas Dixon 1 Nov 2007

Deciding the First-Order Theory of an Algebra of Feature Trees with Updates Nicolas Jeannerod,

Rev 5:1, And I saw in the right hand of Him And I saw in the right hand of Him Rev 5:1,

Pockets: Hi-fi Midway Milestone Amy Nguyen, Cynthia Jia, Nestor Cano, Ryan Rice Team Members

Advanced Tree Structures Department of Computer Science University of Maryland, College Park

Final Review Autumn 2018 Shrirang (Shri) Mare shri@cs.washington.edu Thanks to Kasey Champion,

COMP 204 Algorithm design: Selection and Insertion Sort Mathieu Blanchette based on material

B i n a r y s e a r c h S e a r c h i n g S u p p o s e I g i v e

' $ Institut f ur Mathematik Universit at Augsburg Ulrich - PDF document

' $ Institut f ur Mathematik Universit at Augsburg Ulrich R ude ' $ Eciency of numerical algo rithms on future high p erfo rmance sup ercomputers Ulrich R ude Institut f ur Mathematik Universit

Mobile &amp; Location Based Services Anto Aasa 2018 Augsburg http://aasa.ut.ee/augsburg

Positioning technologies Anto Aasa Augsburg 2019 http://aasa.ut.ee/augsburg Location aware

Worldview &amp; ICT Map: North is up ! http://aasa.ut.ee/augsburg 1

GRINDTEC AUGSBURG 2020 Informations regarding the GrindTec exhibition in Augsburg, from 18 th to

VR ow Virtual Reality Rowing HS-Augsburg | Interaction Engineering | WS15/16 Oliver Queck |

Location Intelligence. Privacy Augsburg 2020 Anto Aasa http://aasa.ut.ee/augsburg Location

FZM Johannes-Kepler-Forschungszentrum Fakult at f ur Mathematik f ur Mathematik,

Modelling and control of stochastic hybrid PDP systems Alfio Borz Institut fr Mathematik,

Uniqueness Christian Fleischhack Universit at Paderborn Institut f ur Mathematik

Kon-Who? Presence in 12 countries worldwide Production facilities in Augsburg, Deggendorf,

with robots Katrin Ruttmann University of Applied Sciences Augsburg Interaction Engineering

Icon Translator User Interface adaptation for two-hand use on Tablets Lina Cui Peris Njuguna

How Broadcast Data Reveals Your Identity and Social Graph Rolf Winter

Time-spatial correspondence between Pi2 wave power and UV aurora bursts V.A. Pilipenko Institute

Teaching to and graduation of underrepresented groups in Increase Diversity STEM. to reach

Geographical Information System (GIS) &amp; LBS Augsburg 2019 Anto Aasa

An Implicational Logic for Conjecturing and Distributed Proof Attempts Lucas Dixon 1 Nov 2007

Deciding the First-Order Theory of an Algebra of Feature Trees with Updates Nicolas Jeannerod,

Rev 5:1, And I saw in the right hand of Him And I saw in the right hand of Him Rev 5:1,

Pockets: Hi-fi Midway Milestone Amy Nguyen, Cynthia Jia, Nestor Cano, Ryan Rice Team Members

Advanced Tree Structures Department of Computer Science University of Maryland, College Park

Final Review Autumn 2018 Shrirang (Shri) Mare shri@cs.washington.edu Thanks to Kasey Champion,

COMP 204 Algorithm design: Selection and Insertion Sort Mathieu Blanchette based on material

B i n a r y s e a r c h S e a r c h i n g S u p p o s e I g i v e

Mobile & Location Based Services Anto Aasa 2018 Augsburg http://aasa.ut.ee/augsburg

Worldview & ICT Map: North is up ! http://aasa.ut.ee/augsburg 1

Geographical Information System (GIS) & LBS Augsburg 2019 Anto Aasa