PAD Cluster: An Open, Modular and Low Cost High Performance - - PowerPoint PPT Presentation
PAD Cluster: An Open, Modular and Low Cost High Performance - - PowerPoint PPT Presentation
PAD Cluster: An Open, Modular and Low Cost High Performance Computing System Volnys Borges Bernal Sergio Takeo Kofuji Guilherme Matos Sipahi Marcio Lobo Netto Laboratrio de Sistemas Integrveis, EPUSP Alan G. Anderson Elebra Defesa e
Agenda
- Main Objectives
- PAD Cluster E
nvironment
- PAD Cluster Architecture
- Communication Libraries
- System Administrator Tools
- Operator Tools
- User Tools
- Development E
nvironment
PAD Cluster
- Main goals
– Parallel Cluster Based Computing E nvironment
- Based on Commodity Components
- High Performance Communication Medium
- Development E
nvironment for Fortran77, fortran90 & HPF
- MPI Interface
- IE
E E POSIX UNIX Interface
- X-Windows Interface
– Initial Application:
- RAMS ( Regional Atmospheric Modeling System )
- Development: LSI-E
PUSP + E lebra, FINE P support
PAD Cluster
- Characteristics
– Use of High Performance Commodities Components – Linux Operating System
- Important:
– Integration
- Hardware components
- Software subsystems
PAD Cluster E nvironment
Configuration & Operation User Interface and Utilities Clustermagic Configuration & Replication Multiconsole Cluster Partitioning CDE Windows Interface LSF Job Scheduling PAD-ptools Parallel UNIX utilities Monitoring System POSIX Unix Interface Development Tools Compilers GNU C, C++ F77 Portland F77, F90 HPF Tools Libraries Portland Profiler Portland
- F77. F90,
Debugger BLAS, BLACS MPI MPICH FULL MPICH-FULL Myrinet API/BPI LaPack ScalaPack
PAD Cluster Architecture
- System Architecture
– Processing nodes – Access Workstation – Administration Workstation – Fast-ethernet switch – Myrinet Switch – Synchronization Hardware
Synchronization Hardware Myrinet switch Processing Node Processing Node Processing Node Processing Node Processing Node Processing Node Processing Node Processing Node Administration Workstation Access Workstation Multi-serial
to external network
Fast-Ethernet Switch
PAD Cluster Architecture
- Node Architecture
Intel Pentium II 333 MHz Intel Pentium II 333 MHz Intel Pentium II 333 MHz Intel Pentium II 333 MHz RAM RAM PCI Bridge PCI Bridge Myrinet Controller Myrinet Controller Fast Ethernet Controller Fast Ethernet Controller SCSI Controller SCSI Controller Lm 78 Lm 78
Communication Infrastructure
- Primary Network
– Fast-E thernet – General purpose network
- For traditional network services (NFS, DNS, SNMP, XNTP, …
)
– Operating System TCP/ IP Stack
Communication Infrastructure
- High Performance Network
– Myrinet – For application data – Communication Libraries:
- MPICH over Operating System TCP/ IP Stack
- FULL user level interface library
- MPICH-FULL user level interface library
Communication Libraries
- MPICH Library
– MPI over TCP/ IP stack
- FULL Library
– User level communication library – Developed in LSI-E PUSP in 1998 – Implementation Based on Cornell’s UNE T
- MPICH-FULL Library
– User level communication library – Internode communication: MPICH + FULL – Intranode communication: MPICH + Shared Memory
Communication Libraries
- MPI-FULL performance
Performance of Myrinet with MPICH-FULL 2 processes (1 process per node) Two 333 MHz dual nodes
10 20 30 40 50 60 200000 400000 600000 800000 1000000 1200000 Size of Package in bytes Mbytes/s
Performance of Myrinet with MPICH-FULL 4 processes (2 processes per node) Two 333 MHz dual nodes
10 20 30 40 50 60 200000 400000 600000 800000 1000000 1200000 Size of Package in bytes MBytes/s
Performance of Myrinet with MPICH-FULL Shared Memory ( 2 processes in one node) One 333 MHz dual node
10 20 30 40 50 60 200000 400000 600000 Size of package in bytes Mbytes/s
Communication Infrastructure
- Synchronization Hardware
– Support for collective MPI operations – Implemented in FPGA – Interfaces for 8 nodes – Based on PAPE RS – Operations
- barrier
- broadcast
- allgather
- allreduce
– Global Wall Clock
Communication Infrastructure
- Serial Lines
– Connects each node to the administration workstation – Allows remote console on the administration workstation
System Administrator Tools
- ClusterMagic
– Two main funcions:
- Cluster Configuration
- Node Replication
– Advantages
- E
asy configuration / reconfiguration
- Assure uniformity
- Fast node replication
System Administrator Tools
- Cluster Magic: Cluster Configuration
cluster.conf cluster magic
- perator
hosts hosts.equiv rhosts fstab nsswitch.conf resolv.conf ifcfg-lo profile inittab issue issue.net motd HOSTNAME lilo.conf exports network ifcfg-eth0 node commun files node specific files bootptab DNS server files adm files generated files
System Administrator Tools
- Cluster Magic: Node Replication
– Node installation based on the replication of a “Womb Node” – ClusterMagic replication diskette:
- boots a small Linux System
- disk partitioning
- womb image copying
- configuration files instalation
- Boot sector initialization
– Automatic process – Takes about 12 minutes
Operator Tools
- Xadmin
– Cluster Partitioning – Remote Commands
- Multiconsole
– Node console access
- Job Scheduling
– Job submission – LSF integrated with Cluster Partitioning
- Cluster Monitoring
Operator Tools
- Xadmin
– Node partitioning
N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N0 Cluster partitioning tool N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N0 P1 P2 P3
Operator Tools
- Xadmin
– Remote Commands
Operator Tools
- Multiconsole
Operator Tools
- Cluster Monitoring
– Java + SNMP agents
User Tools
- PAD-ptools
– Parallel versions of UNIX utilities – pcp, pls, pcat, … – Integratded with cluster partitioning
- LSF
– Job submission and control
- mpirun
– MPICH, MPI-FULL
Development E nvironment
- Portland
– Fortran77 – Fortran90 – HPF – Profiler – Debugger
- Libraries
– BLAS, BLACS, LaPack, ScaLaPack
- TotalView debbuger
- VAMPIR profiler
Conclusions
- Complete product system:
– E lebra Vortix Cluster ( PAD Cluster )
- www.elebra.com.br/ aero
- Several Developments:
– Hardware
- Collective operations,
Synchronization and Global Clock
– Software
- Communication Libraries
- Cluster Tools
- Communication Drivers
Future Works
- University of São Paulo + Purdue University +
University of Pittsburg
– Hardware for collective operations and synchronization with PCI 64 bits Interface
- University of São Paulo + ICS-FOTH ( Greece )
– ATM Like Switch on 2.4 Gbps/ s
- University of São Paulo
– New cluster administration, management and secure tools – High Availability Data Base applications
Acknowledgments
- FINE
P
- LSI-E
PUSP Development Team
- E