HPC Environment Management: New Challenges in the Petaflop Era - - PowerPoint PPT Presentation

hpc environment management new challenges in the petaflop
SMART_READER_LITE
LIVE PREVIEW

HPC Environment Management: New Challenges in the Petaflop Era - - PowerPoint PPT Presentation

HPC Environment Management: New Challenges in the Petaflop Era Jonas Dias jonas@nacad.ufrj.br Albino Aveleda bino@nacad.ufrj.br VECPAR10 Agenda 1. Introduction 2. Available Tools 1. Deployment 2. Monitoring 3. Proprietary Solutions


slide-1
SLIDE 1

VECPAR’10

HPC Environment Management: New Challenges in the Petaflop Era

Jonas Dias jonas@nacad.ufrj.br Albino Aveleda bino@nacad.ufrj.br

slide-2
SLIDE 2

Agenda

  • 1. Introduction
  • 2. Available Tools
  • 1. Deployment
  • 2. Monitoring
  • 3. Proprietary Solutions
  • 3. LEMMing Project
  • 4. Conclusion

6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 2

slide-3
SLIDE 3

Introduction

  • High Performance Computing Systems

– Universities – Research centers – Experiments, simulations – Industry Sector

  • 62.4% (11/09 top500.org list)
  • Petaflop barrier

6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 3

slide-4
SLIDE 4

Growth of processors per system

slide-5
SLIDE 5

Management and Monitoring Tools

  • Systems with many processors
  • Organized information

– List of nodes – Hierarchical approach

  • Usability

– Expert and non‐expert managers

slide-6
SLIDE 6

Managing a Supercomputer

  • Grant secure access
  • Quick handling defects and problems
  • Offer a queue system
  • Use some monitoring tools
  • Support non uniform infrastructure
  • Integrate with local tools
slide-7
SLIDE 7

Administrate a HPC center

  • Proprietary Software

– Integration – Usability

  • An open source proposal

– LEMMing

  • Single point of management
  • RIA
  • Customization
slide-8
SLIDE 8

Available Tools

  • Deployment Tools

– OSCAR – ROCKS – xCAT

  • Monitoring Tools

– Cacti – Ganglia – Nagios

6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 8

slide-9
SLIDE 9

The Deployment

  • Should be easy

– GUI – Out of the box installation

  • Integrate management features

– Node adding and removal – Changes in properties

  • Basic HPC Tools

– MPI – Queue system

  • Offer monitoring tools

6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 9

slide-10
SLIDE 10

Comparison

6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 10

Cluster Installation Node Adding MPI Queuing System Monitorin g Tool OSCAR GUI GUI + Network listening Yes Yes Yes Rocks GUI UI + Network listening Yes Yes Yes xCAT Command Line Command Line + Manual Adding No No No

slide-11
SLIDE 11

Monitoring Tools

  • Web based

– Easy access

  • Rich Internet Application
  • Alert sending
  • Customizable

– Plug‐ins

  • The monitoring focus

6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 11

slide-12
SLIDE 12

Comparison

6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 12

Web Based RIA Send Alert Plugins Monitoring focus Cacti Yes No No Yes Network Ganglia Yes No No No Cluster/Grid Nagios Yes No Yes Yes Network

slide-13
SLIDE 13

Proprietary solutions

  • Usually use some open source apps

– OSCAR, Rocks, xCAT, Ganglia, Nagios, Cacti…

  • Tune the cluster configuration
  • Proprietary tools for administration

– Hardware specific

  • Poor integration

– Different vendors

6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 13

slide-14
SLIDE 14

Challenges to a HPC environment

  • Increasing number of processors
  • Heterogeneous environments

– Resources from different machines

  • Particular/Local tools
  • Administrators with different level of

knowledge

  • Present available resources as a whole

– Organized and customized

6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 14

slide-15
SLIDE 15

An example

  • Node naming and organization

– Simple form

node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7 node 8 Expansion node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7 node 8 node 9 node 10 node 11 node 12

slide-16
SLIDE 16

An example

  • Node naming and organization

– Hierarchical approach

c0n0 c0n1 c0n2 c0n3 c0n4 c1n0 c1n1 c1n2 c1n3 Expansion c0n0 c0n1 c0n2 c0n3 c0n4 c1n0 c1n1 c1n2 c1n3 c0n5 c0n6 c1n4 c1n5

slide-17
SLIDE 17

LEMMing Project

  • Inspired on Zimbra Collaboration Suite
  • Use Open Source tools
  • Use AJAX technologies
  • LEMMing is not an extension

– Less dependent – Great usability

6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 17

slide-18
SLIDE 18

What is LEMMing?

  • Linux Enterprise Management and Monitoring
  • Cluster with thousands of nodes

– Many failures

  • Flexibility
  • Easiness to add features
  • Great usability

– Detect and solve problems faster

6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 18

slide-19
SLIDE 19

Features

  • Being freeware
  • Web Service based
  • AJAX interface design
  • Integration of other tools
  • Single point of management
  • Tested with Rocks clusters
  • Support for many cluster topologies organization
  • Integrated with workload management
  • Parallel shell tools
  • Customizable Dashboard

6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 19

slide-20
SLIDE 20

LEMMing Modules

  • LEMM‐WS

– Web Services – Coupled to the supercomputer – API

  • LEMM‐GATE

– Web application – Independent of the cluster

6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 20

slide-21
SLIDE 21

LEMMing Modules Relationship

6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 21

slide-22
SLIDE 22

LEMM‐GATE interface

6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 22

slide-23
SLIDE 23

LEMM‐GATE interface

6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 23

slide-24
SLIDE 24

Conclusion

  • Huge HPC centers

– Heterogeneous machines – Many nodes per cluster

  • LEMMing

– Integrate multiple clusters management and monitoring software stack – Rich internet application – Open Source model – Use of available tools

6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 24

slide-25
SLIDE 25

Future Work

  • Add support to different cluster systems
  • IPMI support
  • Queue management
  • Visist us:

– http://lemm.sf.net – Check the video demonstration

6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 25

slide-26
SLIDE 26

Acknowledgments

  • The author thanks:

– High Performance Computing Center – Professor Alvaro Coutinho – DELL Brazil

6/24/2010 HPC Environment Management: New Challenges in the Petaflop Era 26

slide-27
SLIDE 27

VECPAR’10

HPC Environment Management: New Challenges in the Petaflop Era

Jonas Dias jonas@nacad.ufrj.br Albino Aveleda bino@nacad.ufrj.br

Thanks!