SUPERCOMPUTADOR SDUMONT: VISES DE QUEM USA (, DE QUEM PROGRAMA) E DE - - PowerPoint PPT Presentation

supercomputador sdumont vis es de quem usa de quem
SMART_READER_LITE
LIVE PREVIEW

SUPERCOMPUTADOR SDUMONT: VISES DE QUEM USA (, DE QUEM PROGRAMA) E DE - - PowerPoint PPT Presentation

ANTNIO TADEU AZEVEDO GOMES LNCC/MCTI SUPERCOMPUTADOR SDUMONT: VISES DE QUEM USA (, DE QUEM PROGRAMA) E DE QUEM OPERA O SUPERCOMPUTADOR SANTOS DUMONT - SERVICE PROVISIONING 9+1 CENTERS - DEVELOPMENT (E.G. SCIENCE GATEWAYS) - TRAINING


slide-1
SLIDE 1

SUPERCOMPUTADOR SDUMONT: VISÕES DE QUEM USA (, DE QUEM PROGRAMA) E DE QUEM OPERA

ANTÔNIO TADEU AZEVEDO GOMES — LNCC/MCTI

slide-2
SLIDE 2

O SUPERCOMPUTADOR SANTOS DUMONT

slide-3
SLIDE 3

9+1 CENTERS

  • SERVICE PROVISIONING
  • DEVELOPMENT (E.G. SCIENCE GATEWAYS)
  • TRAINING

400 TFLOPS 5.2 PFLOPS 226 TFLOPS

slide-4
SLIDE 4

LNCC

4 The SDumont petaflopic facility
slide-5
SLIDE 5

SDUMONT 1.0

CONFIGURATION (BULLX)

▸ ~1.2 PFlops computing capability ▸ 758 nodes: B710 Ivy Bridge, B715 Ivy Bridge + K40 (2 pn), B715 Ivy Bridge +

Phi KC (2 pn) 64 Gb, S6030 Ivy Mesca2 6 Tb, DGX-1 V100 (8 pn)

▸ ~1.7 Pb Lustre storage; Infiniband interconnection (FDR)

1

1 198 54 504

B710 B715 PHI B715 K40 DGX-1 S6030

12

104 456 363 321

slide-6
SLIDE 6

SDUMONT 2.0

CONFIGURATION (SEQUANA)

▸ + ~4.0 PFlops computing capability ▸ 376 nodes with 3 configurations: X1120 CascadeLake 384 & 768 Gb, X1125

Volta V100 (4 pn)

▸ + ~1 Pb Lustre storage; Infiniband interconnection (EDR)

94 36 246

X1120 CL 384G X1120 CL 768G X1125 CL+V100

2 900 115 785

slide-7
SLIDE 7

SOBRE QUEM USA

slide-8
SLIDE 8

5 OPEN CALLS
 (PROJECTS FROM 1ST CALL ENDED IN 2018; FROM 5TH CALL BEGINNING THIS YEAR) 230+ PROJECTS IMPLEMENTED (PEER-REVIEWED) 1,200+ ACTIVE USERS 500,000+ JOBS AND 530,000,000+ SERVICE UNITS SINCE AUG/2016 720+ TERABYTES STORED

1 2 2 2 3 9 19 18 14 11 20 25 52 53 62 92 99

Chemistry Physics Engineering Biological sciences Computer science Health sciences Geosciences Weather/climate Astronomy Maths Material sciences Biodiversity Pharmacy Economy Oceanography Agricultural sciences Social sciences Linguistics

63 56 21 18 7 8 1 5 3 2 1 2 2

slide-9
SLIDE 9
slide-10
SLIDE 10 Zika / Dengue Cell signaling Painkillers Inflammatory processes Antimicrobial peptides
slide-11
SLIDE 11
slide-12
SLIDE 12 Resistant nanostructures C02 catalysis Nuclear magnetic resonance (NMR)
 parameterization Catalytic hydrogen 
 production C02 capture
slide-13
SLIDE 13
slide-14
SLIDE 14 Heart electric-mechanical processes Combustion engines Avionics Multiscale porous-media flows Seismic inversion
slide-15
SLIDE 15
slide-16
SLIDE 16 Hemodynamics Evolution of dwarf galaxies Design of photovoltaic cells Cosmic collisions
slide-17
SLIDE 17
slide-18
SLIDE 18 Electrochemical interfaces Transport systems Industrial automation Sentiment analysis
slide-19
SLIDE 19
slide-20
SLIDE 20

ACTIONS RELATED WITH COVID-19 SDUMONT: AÇÕES RELACIONADAS À 
 COVID-19

slide-21
SLIDE 21

ACTIONS RELATED WITH COVID-19 SDUMONT: AÇÕES RELACIONADAS À 
 COVID-19

slide-22
SLIDE 22

ACTIONS RELATED WITH COVID-19 SDUMONT: AÇÕES RELACIONADAS À 
 COVID-19

slide-23
SLIDE 23

ACTIONS RELATED WITH COVID-19 SDUMONT: AÇÕES RELACIONADAS À 
 COVID-19

slide-24
SLIDE 24

ACTIONS RELATED WITH COVID-19 SDUMONT: AÇÕES RELACIONADAS À 
 COVID-19

slide-25
SLIDE 25

ACTIONS RELATED WITH COVID-19 SDUMONT: AÇÕES RELACIONADAS À 
 COVID-19

slide-26
SLIDE 26

SOBRE QUEM USA 
 (, QUEM PROGRAMA) E 
 QUEM OPERA

slide-27
SLIDE 27

WHERE TO BEGIN

MODULES, 
 MODULES, 
 MODULES…

slide-28
SLIDE 28

WHERE TO BEGIN

MODULES, 
 MODULES, 
 MODULES…

slide-29
SLIDE 29

WHERE TO BEGIN

MODULES, 
 MODULES, 
 MODULES…

slide-30
SLIDE 30

WHERE TO BEGIN

MODULES, 
 MODULES, 
 MODULES…

slide-31
SLIDE 31

WHERE TO BEGIN

QUEUES, QUEUES, QUEUES…

slide-32
SLIDE 32

WHERE TO BEGIN

SALLOC, SRUN, SBATCH, SQUEUE, SACCT…

THE ANATOMY OF A JOB IN SDUMONT

slide-33
SLIDE 33

SOBRE QUEM OPERA

slide-34
SLIDE 34

O&M

MONITORING

▸ Shared operation ▸ LNCC: user services ▸ ATOS/Bull: availability 


(power outages, 
 cooling problems…)

▸ 24x7 / 8x5 ▸ NAGIOS (automated) + 


GRAFANA (manual/analysis)

▸ Control version ▸ Monthly reports

slide-35
SLIDE 35

ANALYTICS

Decile 0 % 10 % 20 % 30 % 1 40 % 1 50 % 10 60 % 111 70 % 2554 80 % 28055 90 % 25827 100 % 1088599

THE SYSTEMS’ BEHAVIOR

~ 18 hours Between
 7 hours and 12 days!

Decile 0 % 10 % 20 % 30 % 1 40 % 1 50 % 10 60 % 111 70 % 2554 80 % 28055 90 % 112358 100 % 4920842

~ 57 hours Between
 1 and 23 days!

slide-36
SLIDE 36

SOBRE QUEM OPERA 
 (E QUEM USA)

slide-37
SLIDE 37

PROJECT MANAGEMENT

INTRANET

slide-38
SLIDE 38

PROJECT MANAGEMENT

INTRANET

slide-39
SLIDE 39

SOBRE QUEM DESENVOLVE

slide-40
SLIDE 40

HPC SOFTWARE DEVELOPMENT: A VIEWPOINT

THE APPLICATION PORTING WORKFLOW:

SOURCE: HTTPS://HBP-HPC-PLATFORM.FZ-JUELICH.DE/?PAGE_ID=732

slide-41
SLIDE 41

"THE FUNCTION OF GOOD SOFTWARE IS TO MAKE THE COMPLEX APPEAR TO BE SIMPLE"

Grady Booch

SOURCE: Booch, G. Object- Oriented Analysis 
 and Design with Applications (2007)
slide-42
SLIDE 42

HPC SOFTWARE DEVELOPMENT: A VIEWPOINT

SCIENTIFIC SOFTWARE

“the new breed of scientist must be a broadly-trained expert in statistics, in computing, in algorithm-building, in software design” “open, well-documented, and well-tested scientific code is essential not only to reproducibility in modern scientific research, but to the very progression of research itself” “an article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.” “academia has been singularly successful at discouraging these very practices that would contribute to its success”

Jake Vanderplas: http://jakevdp.github.io/blog/2013/10/26/big-data-brain-drain/ Buckheit & Donoho: “Wavelab and Reproducible Research” http://www-stat.stanford.edu/~wavelab/ Elsevier Executable Paper Challenge: http://www.executablepapers.com/
slide-43
SLIDE 43

WHAT’S YOUR ROLE IN THIS STORY?

Gilles Allain

SOURCE: http://blog.khinsen.net/posts/2020/05/18/an-open-letter-to-software-engineers-criticizing-neil-ferguson-s-epidemics-simulation-code/
slide-44
SLIDE 44

HPC SOFTWARE DEVELOPMENT: A VIEWPOINT

TECHNIQUES FOR TAMING TECHNICAL COMPLEXITY

▸ Rapid prototyping ▸ Model-driven development ▸ (To mention my beloved ones…)

slide-45
SLIDE 45

HPC SOFTWARE DEVELOPMENT: A VIEWPOINT

RAPID PROTOTYPING

Efficiency(

Compiled( language( Parallel( composi5onality(

Produc5vity(

Dynamic(languages( Sequen5al( composi5onality(

Prototyping( Programming( Interface(

Inspiration: Krste Asanovic, Rastislav Bodik, James Demmel, Tony Keaveny, Kurt Keutzer, John Kubiatowicz, Nelson Morgan, David Patterson, Koushik Sen, John Wawrzynek, David Wessel, Katherine Yelick. Communications of the ACM, O Pages 56-67. http://doi.org/10.1145/1562764.1562783
slide-46
SLIDE 46

HPC SOFTWARE DEVELOPMENT: A VIEWPOINT

MODEL-DRIVEN DEVELOPMENT

Software “Architectural
 code” Application
 model DSL (Per family!) Generator “Architectural
 code” Specific
 code Generic
 code Specific
 code Platform 
 code

Refactoring MDD

Inspiration: Thomas Stahl, Markus Voelter, and Krzysztof Czarnecki. 2006. Model-Driven Software Development: Technology, Engineering, Management. John Wiley & Sons, Inc., Hoboken, NJ, USA.
slide-47
SLIDE 47

HPC SOFTWARE DEVELOPMENT: A VIEWPOINT

INNOVATIVE PARALLEL FINITE ELEMENT SOLVERS — IPES

Develop, analyze and validate innovative multiscale numerical models and methods through the use of modern mathematical and computational techniques and strategies for deployment on massively parallel architectures Contribute to multidisciplinary human-resources training

PETROBRAS, INRIA, UDEC, IUT Lyon, Univ. of Straitclyde, Univ. Grenoble Alpes
slide-48
SLIDE 48

HPC SOFTWARE DEVELOPMENT: A VIEWPOINT

PROJECTS INVOLVING MHM

▸ My role in these projects: the software of course!

PADEF

slide-49
SLIDE 49

HPC SOFTWARE DEVELOPMENT: A VIEWPOINT

THE MSL SET OF LIBRARIES

▸ Expresses variational formulations symbolically evaluated at compile-time

and numerically evaluated at runtime

▸ Supports classical and MHM-based variational formulations ▸ Hybrid parallelization (OpenMP and MPI): ▸ Assembly of integrals ▸ Solution of linear system(s) ▸ Post-processing of solution

slide-50
SLIDE 50

HPC SOFTWARE DEVELOPMENT: A VIEWPOINT

EFFICIENCY-ORIENTED DEBUGGING

slide-51
SLIDE 51

HPC SOFTWARE DEVELOPMENT: A VIEWPOINT

CHARACTERIZING AND FIXING MEMORY ALLOCATION ANOMALIES

(HTTPS://GITLAB.COM/ENZOMOLION/PROFILING-LIBRARY)

1x108 2x108 3x108 4x108 5x108 6x108 1 4 16 64 256 1024 4096 16384 65536 262144 cumulative number of allocations allocation size (logscale) Not optimized After 1st iteration After 2nd iteration After 3rd iteration
slide-52
SLIDE 52

HPC SOFTWARE DEVELOPMENT: A VIEWPOINT

IMBALANCE ASSESSMENT (WWW.GITHUB.COM/LAPESD/LIBGOMP)

  • 100
200 300 500 1000 1500 2000 Simulation Time−Step Time (ms)
  • Local Problems
Post−Processing
  • 100
200 300 500 1000 1500 2000 Simulation Time−Step Time (ms)
  • Local Problems
Post−Processing
  • penmp dynamic strategy
binLPT strategy
slide-53
SLIDE 53

HPC SOFTWARE DEVELOPMENT: A VIEWPOINT

REAL APPLICATIONS

slide-54
SLIDE 54

COMENTÁRIOS FINAIS

slide-55
SLIDE 55

CONCLUDING REMARKS

HPC

▸ Many dimensions, none simple ▸ Need for human resources, more than physical resources! ▸ Approximation between domain experts and HPC experts ▸

slide-56
SLIDE 56

OBRIGADO! THANK YOU! ¡GRACIAS! MERCI! DANKE!

ATAGOMES@LNCC.BR HTTP://WWW.LNCC.BR/~ATAGOMES