HPC Operations at the Cyprus Institute George Tsouloupas, PhD Head - PowerPoint PPT Presentation

HPC Operations at the Cyprus Institute George Tsouloupas, PhD Head of HPC Facility George Tsouloupas (Nicosia 2015)

Overview ● Organization ● Hardware resources (clusters, storage, networking) ● Software (OS deployment, services and cloud infrastructure) ● Libraries and scientific software deployment using EasyBuild ● Tools George Tsouloupas (@JSC 2014)

Short History ● The Cyprus Institute est. 2007 ● CaSToRC (Director: Dina Alexandrou) ○ Central goal: To develop world-class research and education in computational science serving the Eastern Mediterranean in collaboration with other regional institutions ○ Development of a national High Performance Computing centre ● CyTera commissioned in Dec 2011 George Tsouloupas (@JSC 2014)

CyTera ● Cy-Tera is the first large cluster as part of a Cypriot National HPC Facility ● Cy-Tera Strategic Infrastructure Project ○ A new research unit to host a HPC infrastructure ○ RPF funded project (i.e. Nationally funded) ● LinkSCEEM leverages Cy-Tera ● Contributes resources to PRACE George Tsouloupas (@JSC 2014)

LinkSCEEM George Tsouloupas (@JSC 2014)

Projects and Resource Allocation ● Cyprus Meteorology Service George Tsouloupas (@JSC 2014)

Projects and Resource Allocation ● Semi-annual Allocation process ○ Internal technical reviews ○ External scientific reviews ● 43 Production projects to date ● 75 Preparatory projects to date George Tsouloupas (@JSC 2014)

HPC Ops George Tsouloupas (@JSC 2014)

Organization: Responsibilities ● Stelios Erotokritou ○ Project Liaison , Networking, System Administration ● Thekla Loizou ○ User Support , Scientific Software, System Administration. ● Andreas Panteli ○ PRACE services , System Administration, Scientific Software, Networking. ● George Tsouloupas ○ System Administration, User Support, Scientific Software, PRACE services, Networking, NCSA Liaising, HPC Ops head. George Tsouloupas (@JSC 2014)

Maintenance and downtimes ● Scheduled downtime - Monthly Maintenance ○ 0.7% downtime ● Unscheduled downtime ○ <0.1% due to operator blunders ○ UPS Issues: an additional 15-20 hours of downtime ● Downtime for Rebuilding CyTera: ○ Estimated <5% ● Still well within the promised 80% uptime George Tsouloupas (@JSC 2014)

Hardware Resources George Tsouloupas (@JSC 2014)

Resources -- Cytera ● Hybrid CPU/GPU Linux Cluster ● Computational Power ● 98 x 2 x 6-core compute nodes ● Each compute node = 128GFlops ● 18 x 2 x 6-core + 2 x NVIDIA M2070 GPU nodes ● Each GPU node = 1 Tflop ● Theoretical Peak Performance (TPP) = 30.5Tflops ● 48 GB memory per node ● MPI Messaging & Storage Access ● 40Gbps QDR Infiniband ● Storage: 360TB raw disk

Resources -- Prometheus ● ex PRACE Prototype ● Hybrid CPU/GPU Linux Cluster ● Computational Power ● 8 x 2 x 6-core + 2 x NVIDIA M2070 GPU nodes ● 24 GB memory per node ● MPI Messaging & Storage Access ● 40Gbps QDR Infiniband ● Storage: 40TB raw disk

Euclid -- Training Cluster ● Hybrid CPU/GPU Linux Cluster ● Training Cluster of the LinkSCEEM project ● Computational Power ○ 6 eight-core compute nodes + 2 NVIDIA Tesla T10 processors ● 16 GB memory per node ● MPI Messaging & Storage Access ● Infiniband Network ● Storage: 40TB raw disk ● In-house + Universities in Cyprus, Jordan; workshops...

Prototype Clusters ● Dell C8000 chassis ○ 2 nodes * 2 Xeon Phi + 2 nodes * 2x NVIDIA K20m ● MIC MEGWARE ○ 12 Xeon Phi Accelerators in 4 nodes.

Post-Processing ● post01 , post02 ○ 128GB Ram ○ Access to all filesystems ● Same software and modules as the clusters ○ compiled specifically for each node

Storage

Storage ● DDN9900 (LTS) ○ GPFS ● Cytera storage (IBM) ○ 200TB ○ 360TB (raw) ○ being phased out ■ 100TB scratch ● “ONYX” ○ 4.7GBytes/s ○ Commodity Hardware ○ GPFS ○ FhGFS/ BeeGFS ○ project storage ○ 360TB ● DDN7700 (LTS) ● BACKUP ○ GPFS ○ 80TB ○ 1GB/s ● DDN9550 (Auxiliary) ○ 180TB ○ room for another 400TB ○ NFS, Lustre ○ 40TB

“ONYX” storage integrated from scratch -- BeeGFS over ZFS over JBODs = Very good value for money! (<100euro/TB including the servers!) IB RDMA / IB TCP / Ethernet TCP Metadata SAS SAS Multipath 90x 4TB disks + 4 SSD disks

- Up to 3GB/s Writes (iozone) - Around 14000 directory creates/ second

Software (System) -- Filesystems ● Four GPFS filesystems ○ On three storage systems ○ GPFS multiclustering ○ Project Storage + LTS ● FhGFS/BeeGFS ○ Home directories on Euclid ○ Home directories on Prometheus ○ New 360TB system George Tsouloupas (@JSC 2014)

Software (System) -- Deployment ● XCAT ○ Two deployment servers (Separate VLAN’s) ■ Cytera ■ Everything else ○ “Thin” deployment ● Ansible ○ Infrastructure as code , git maintained ○ Manual configuration prohibited George Tsouloupas (@JSC 2014)

Software (System) -- Services ● Cy-Tera ○ RHEL 6 x86_64 ○ Torque/Moab SLURM ● Prometheus ○ CentOS 6.5 ○ SLURM ● Euclid ○ CentOS 6.5 ○ Torque/Maui SLURM ● Planck (Testing Cluster) ○ SLURM George Tsouloupas (@JSC 2014)

Software -- Workload Management ● 1st SLURM test on prototype cluster in 2012 ○ Basic configuration (single queue, etc.) ● Decision to move to SLURM ○ Save up on MOAB Licensing 50K over three years ■ Thats 1/2 of an engineer in terms of cost ○ Uniform scheduler across systems ○ It’s much easier to set up a test environment if you don’t have to worry about licensing... ● Transition from Moab to slurm ○ Gave users a 4-month head-start with access to SLURM ○ 80% of users only made the transition after they could no longer run on MOAB... George Tsouloupas (@JSC 2014)

SLURM Migration ● GOAL: Implement the exact functionality that we had in MOAB ○ Routing queues for gpu - cpu , job-size ○ Low-priority queues ○ Standing reservations + triggers George Tsouloupas (@JSC 2014)

SLURM Migration ● Requested memory on gpu nodes (gres). When a user was asking for mem-per-cpu more than 4 megabytes the nodes were allocated but they were remaining idle. To solve this we always make the requested memory per cpu equal to "0" for gpu jobs , in the job submission plugin. ● No triggers to start job in a reservation. We used cron. ● No routing queues as in Torque. We implemented the functionality in the job_submit plugin . ● Bug in IntelMPI with slurm, concerning hostlist parsing. Solved after IntelMPI Version 4.1 Update 3. ● Standing reservations locked into specific nodes. George Tsouloupas (@JSC 2014)

Software Available on all systems... ● Intel Compiler Suite (optimised on Intel architecture) ● PGI Compiler Suite (including OpenACC for GPU’s) (WIP for all systems) ● CUDA ● Optimised math libraries George Tsouloupas (@JSC 2014)

Scientific software and Libraries How I Learned to Stop Worrying and Love EasyBuild Facts: ● Modules provided to users: 641 a2ps Bonnie++ CUDA GDB guile LAPACK MCL numpy Qt TiCCutils ABINIT Boost cURL Geant4 gzip libctl MEME NWChem QuantumESPRESSO TiMBL ABySS Bowtie DL_POLY_Classic GEOS Harminv libffi MetaVelvet Oases R TinySVM AMOS Bowtie2 Doxygen gettext HDF libgtextutils METIS OpenBLAS RAxML Tk ant BWA EasyBuild GHC HDF5 libharu Mothur OpenFOAM RNAz Trinity aria2 byacc Eigen git HH-suite Libint MPFR OpenMPI SAMtools UDUNITS arpack-ng bzip2 ELinks GLib HMMER libmatheval mpiBLAST OpenPGM ScaLAPACK util-linux ATLAS cairo EMBOSS GLIMMER HPL libpng MrBayes OpenSSL ScientificPython Valgrind Autoconf ccache ESMF glproto hwloc libpthread- stubs MUMmer PAML SCons Velvet bam2fastq CD-HIT ETSF_IO GMP Hypre libreadline MUSCLE PAPI SCOTCH ViennaRNA BamTools CDO expat gmvapich2 icc libsmm MVAPICH2 parallel SHRiMP VTK Bash cflow FASTA gmvolf iccifort libtool NAMD ParFlow Silo WPS bbFTP cgdb FASTX-Toolkit gnuplot ictce libunistring nano ParMETIS SOAPdenovo WRF bbftpPRO Chapel FFTW goalf ifort libxc NASM PCRE Stacks xorg-macros beagle-lib Clang FIAT gompi imkl libxml2 NCL Perl Stow xproto BFAST ClangGCC flex google-sparsehash impi libxslt nco PETSc SuiteSparse YamCha binutils CLHEP fontconfig goolf Infernal libyaml ncurses pixman Szip Yasm biodeps ClustalW2 freeglut goolfc iomkl likwid netCDF pkg-config Tar ZeroMQ Biopython CMake freetype gperf Iperf LZO netCDF-Fortran PLINK tbb zlib Bison Corkscrew g2clib grib_api JasPer M4 nettle Primer3 Tcl zsync BLACS CP2K g2lib GROMACS Java makedepend NEURON problog tcsh BLAT CRF++ GCC GSL JUnit mc numactl Python Theano ● Modules that can be provided within hours: 2238 George Tsouloupas (@JSC 2014)

Software ● Automated reproducible build processes ● Maintain multiple compilers/versions ● 1000’s of software packages

Targetting communities e.g. bioinformatics ● Local team has contributed tens of bioinformatics- related packages to EasyBuild. (posters at BBC13 and CSC2013) ● Galaxy server ○ tested last summer ○ to be deployed.

HPC Operations at the Cyprus Institute George Tsouloupas, PhD Head - PowerPoint PPT Presentation

HPC Operations at the Cyprus Institute George Tsouloupas, PhD Head of HPC Facility George Tsouloupas (Nicosia 2015) Overview Organization Hardware resources (clusters, storage, networking) Software (OS deployment, services and

R2R Program & Cyprus R2R Program & Cyprus R2R Program & Cyprus R2R Program &

Where is Cyprus? 3 Cyprus Cyprus is a member state of the European Union (held the

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

Hadron Structure in Lattice QCD C. Alexandrou University of Cyprus and Cyprus Institute PSI,

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

Welcome to Cyprus! Renal services in in Cyprus Renal services in in Cyprus Renal

CYPRUS SHIPPING CYPRUS SHIPPING TAX & OTHER ADVANTAGES TAX & OTHER

CYPRUS E-COMMERCE NEW GATEWAY TO EMEA REGION Cyprus Geographical Location Cyprus

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

Cyprus Nasos Hadjigeorgiou Executive Manager, Pafos Regional Board of Tourism-CYPRUS

Alternative Sewage Sludge Management Routes in Cyprus CYPRUS 2016 4th International Conference

Bank of Cyprus Group Corporate presentation December 2019 The Bank of Cyprus Group is the

Introduction to the MusiQuE Peer- Reviewers Training Session o Why this workshop? offering

2016-17 Budget Presentation Academic Council May 17, 2016 1 Agenda Item 9 Agenda Budget

February 26, 2014 Presented by: Lisa Mowery, Acting Chief Financial Officer Department of Public

Growth in the Puget Sound Region ROSS Executive Committee February 17, 2016 Growth in the Puget

SySTEMic Solutions Expanding Integrative STEM Education in Loudoun County June 2013 There are

Thiru. SURJIT SINGH BARNALA has kindly consented to inaugurate the programme Honourable Member

National Core Arts Standards What do YOU need to know? Johanna J. Siebert, Ph.D. Music Education

2 3 Preschool Program Our Preschool program launches children into their academic careers.

Sambuz

Useful Links

Newsletter

Mail Us

HPC Operations at the Cyprus Institute George Tsouloupas, PhD Head - PowerPoint PPT Presentation

HPC Operations at the Cyprus Institute George Tsouloupas, PhD Head of HPC Facility George Tsouloupas (Nicosia 2015) Overview Organization Hardware resources (clusters, storage, networking) Software (OS deployment, services and

R2R Program &amp; Cyprus R2R Program &amp; Cyprus R2R Program &amp; Cyprus R2R Program &amp;

Where is Cyprus? 3 Cyprus Cyprus is a member state of the European Union (held the

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

Hadron Structure in Lattice QCD C. Alexandrou University of Cyprus and Cyprus Institute PSI,

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

Welcome to Cyprus! Renal services in in Cyprus Renal services in in Cyprus Renal

CYPRUS SHIPPING CYPRUS SHIPPING TAX &amp; OTHER ADVANTAGES TAX &amp; OTHER

CYPRUS E-COMMERCE NEW GATEWAY TO EMEA REGION Cyprus Geographical Location Cyprus

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

Cyprus Nasos Hadjigeorgiou Executive Manager, Pafos Regional Board of Tourism-CYPRUS

Alternative Sewage Sludge Management Routes in Cyprus CYPRUS 2016 4th International Conference

Bank of Cyprus Group Corporate presentation December 2019 The Bank of Cyprus Group is the

Introduction to the MusiQuE Peer- Reviewers Training Session o Why this workshop? offering

2016-17 Budget Presentation Academic Council May 17, 2016 1 Agenda Item 9 Agenda Budget

February 26, 2014 Presented by: Lisa Mowery, Acting Chief Financial Officer Department of Public

Growth in the Puget Sound Region ROSS Executive Committee February 17, 2016 Growth in the Puget

SySTEMic Solutions Expanding Integrative STEM Education in Loudoun County June 2013 There are

Thiru. SURJIT SINGH BARNALA has kindly consented to inaugurate the programme Honourable Member

National Core Arts Standards What do YOU need to know? Johanna J. Siebert, Ph.D. Music Education

2 3 Preschool Program Our Preschool program launches children into their academic careers.

Sambuz

Useful Links

Newsletter

Mail Us

R2R Program & Cyprus R2R Program & Cyprus R2R Program & Cyprus R2R Program &

CYPRUS SHIPPING CYPRUS SHIPPING TAX & OTHER ADVANTAGES TAX & OTHER