VECPAR’10
HPC Environment Management: New Challenges in the Petaflop Era
Jonas Dias jonas@nacad.ufrj.br Albino Aveleda bino@nacad.ufrj.br
HPC Environment Management: New Challenges in the Petaflop Era - - PowerPoint PPT Presentation
HPC Environment Management: New Challenges in the Petaflop Era Jonas Dias jonas@nacad.ufrj.br Albino Aveleda bino@nacad.ufrj.br VECPAR10 Agenda 1. Introduction 2. Available Tools 1. Deployment 2. Monitoring 3. Proprietary Solutions
VECPAR’10
Jonas Dias jonas@nacad.ufrj.br Albino Aveleda bino@nacad.ufrj.br
6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 2
– Universities – Research centers – Experiments, simulations – Industry Sector
6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 3
– List of nodes – Hierarchical approach
– Expert and non‐expert managers
– Integration – Usability
– LEMMing
– OSCAR – ROCKS – xCAT
– Cacti – Ganglia – Nagios
6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 8
– GUI – Out of the box installation
– Node adding and removal – Changes in properties
– MPI – Queue system
6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 9
6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 10
Cluster Installation Node Adding MPI Queuing System Monitorin g Tool OSCAR GUI GUI + Network listening Yes Yes Yes Rocks GUI UI + Network listening Yes Yes Yes xCAT Command Line Command Line + Manual Adding No No No
– Easy access
– Plug‐ins
6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 11
6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 12
Web Based RIA Send Alert Plugins Monitoring focus Cacti Yes No No Yes Network Ganglia Yes No No No Cluster/Grid Nagios Yes No Yes Yes Network
– OSCAR, Rocks, xCAT, Ganglia, Nagios, Cacti…
– Hardware specific
– Different vendors
6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 13
– Resources from different machines
– Organized and customized
6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 14
– Simple form
node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7 node 8 Expansion node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7 node 8 node 9 node 10 node 11 node 12
– Hierarchical approach
c0n0 c0n1 c0n2 c0n3 c0n4 c1n0 c1n1 c1n2 c1n3 Expansion c0n0 c0n1 c0n2 c0n3 c0n4 c1n0 c1n1 c1n2 c1n3 c0n5 c0n6 c1n4 c1n5
– Less dependent – Great usability
6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 17
– Many failures
– Detect and solve problems faster
6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 18
6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 19
– Web Services – Coupled to the supercomputer – API
– Web application – Independent of the cluster
6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 20
6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 21
6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 22
6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 23
– Heterogeneous machines – Many nodes per cluster
– Integrate multiple clusters management and monitoring software stack – Rich internet application – Open Source model – Use of available tools
6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 24
– http://lemm.sf.net – Check the video demonstration
6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 25
– High Performance Computing Center – Professor Alvaro Coutinho – DELL Brazil
6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 26
VECPAR’10
Jonas Dias jonas@nacad.ufrj.br Albino Aveleda bino@nacad.ufrj.br