High Performance Computing (HPC) at UL Present and Future - - PowerPoint PPT Presentation

high performance computing hpc at ul
SMART_READER_LITE
LIVE PREVIEW

High Performance Computing (HPC) at UL Present and Future - - PowerPoint PPT Presentation

High Performance Computing (HPC) at UL Present and Future Challenges Sbastien Varrette, PhD UL HPC Management Team, Parallel Computing and Optimization Group (PCOG), University of Luxembourg (UL), Luxembourg Sbastien Varrette, PhD (UL


slide-1
SLIDE 1

High Performance Computing (HPC) at UL

Present and Future Challenges

Sébastien Varrette, PhD UL HPC Management Team, Parallel Computing and Optimization Group (PCOG), University of Luxembourg (UL), Luxembourg

1 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-2
SLIDE 2

Preliminaries

Summary

1 Preliminaries 2 The UL HPC platform Overview Platform Management Tools Monitoring Statistics & Milestones 3 Incoming Milestones and Challenges

2 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-3
SLIDE 3

Preliminaries

Computing / Storage Performances

HPC: High Performance Computing

Main HPC Performance Metrics

Computing Capacity/speed: often measured in flops (or flop/s)

֒ → Floating point operations per seconds

(often in DP)

֒ → GFlops = 109 Flops TFlops = 1012 Flops PFlops = 1015 Flops

Storage Capacity measured in multiples of bytes = 8 bits

֒ → GB = 109 bytes TB = 1012 bytes PB = 1015 bytes ֒ → GiB = 10243 bytes TiB = 10244 bytes PiB = 10245 bytes

Transfer rate on a medium measured in Mb/s or MB/s Other metrics: Sequential vs Random R/W speed, IOPS

3 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-4
SLIDE 4

Preliminaries

Why High Performance Computing ?

"The country that out-computes will be the one that

  • ut-competes".

Council on Competitiveness

Accelerate research by accelerating computations 14.4 GFlops 49.918TFlops

(Dual-core i7 1.8GHz) (400computing nodes, 4284cores)

Increase storage capacity 2TB (1 disk) 4268.4TB raw(642disks) Communicate faster

1 GbE (1 Gb/s) vs Infiniband QDR (40 Gb/s)

4 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-5
SLIDE 5

Preliminaries

Computing for Researchers

Regular PC / Local Laptop / Workstation

֒ → Native OS (Windows, Linux, Mac etc.)

5 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-6
SLIDE 6

Preliminaries

Computing for Researchers

Regular PC / Local Laptop / Workstation

֒ → Native OS (Windows, Linux, Mac etc.) ֒ → Virtualized OS through an hypervisor

Hypervisor: core virtualization engine / environment Performance loss: ≥ 20%

Xen, VMWare ESXi, KVM VirtualBox 5 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-7
SLIDE 7

Preliminaries

Computing for Researchers

Cloud Computing Platform

֒ → Infrastructure as a Service (IaaS)

6 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-8
SLIDE 8

Preliminaries

Computing for Researchers

Cloud Computing Platform

֒ → Platform as a Service (PaaS)

6 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-9
SLIDE 9

Preliminaries

Computing for Researchers

Cloud Computing Platform

֒ → Software as a Service (SaaS)

6 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-10
SLIDE 10

Preliminaries

Computing for Researchers

High Performance Computing platforms

֒ → For Speedup, Scalability and Faster Time to Solution

7 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-11
SLIDE 11

Preliminaries

Computing for Researchers

High Performance Computing platforms

֒ → For Speedup, Scalability and Faster Time to Solution

YET...

PC = HPC

7 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-12
SLIDE 12

Preliminaries

Computing for Researchers

High Performance Computing platforms

֒ → For Speedup, Scalability and Faster Time to Solution

YET...

PC = HPC

HPC ≃ Formula 1

֒ → can end badly, even after minor errors

7 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-13
SLIDE 13

Preliminaries

Jobs, Tasks & Local Execution

$> ./myprog

CPU 1

Core 2 Core 1

Time

8 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-14
SLIDE 14

Preliminaries

Jobs, Tasks & Local Execution

./myprog

$> ./myprog

CPU 1

Core 2 Core 1

Time

8 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-15
SLIDE 15

Preliminaries

Jobs, Tasks & Local Execution

$> ./myprog -n 10

./myprog

$> ./myprog

CPU 1

Core 2 Core 1

Time

8 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-16
SLIDE 16

Preliminaries

Jobs, Tasks & Local Execution

./myprog -n 10

$> ./myprog -n 10

./myprog

$> ./myprog

CPU 1

Core 2 Core 1

Time

8 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-17
SLIDE 17

Preliminaries

Jobs, Tasks & Local Execution

$> ./myprog -n 100

./myprog -n 10

$> ./myprog -n 10

./myprog

$> ./myprog

CPU 1

Core 2 Core 1

Time

8 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-18
SLIDE 18

Preliminaries

Jobs, Tasks & Local Execution

./myprog -n 100

$> ./myprog -n 100

./myprog -n 10

$> ./myprog -n 10

./myprog

$> ./myprog

CPU 1

Core 2 Core 1

Time

8 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-19
SLIDE 19

Preliminaries

Jobs, Tasks & Local Execution

T1(local) = 100s

./myprog -n 100

$> ./myprog -n 100

./myprog -n 10

$> ./myprog -n 10

./myprog

$> ./myprog

CPU 1

Core 2 Core 1

Time

8 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-20
SLIDE 20

Preliminaries

Jobs, Tasks & Local Execution

Job(s)

3 3 Task(s) T1(local) = 100s

./myprog -n 100

$> ./myprog -n 100

./myprog -n 10

$> ./myprog -n 10

./myprog

$> ./myprog

CPU 1

Core 2 Core 1

Time

8 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-21
SLIDE 21

Preliminaries

Jobs, Tasks & Local Execution

# launcher ./myprog ./myprog -n 10 ./myprog -n 100

CPU 1

Core 2 Core 1

Time

8 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-22
SLIDE 22

Preliminaries

Jobs, Tasks & Local Execution

# launcher ./myprog ./myprog -n 10 ./myprog -n 100

./myprog CPU 1

Core 2 Core 1

Time

8 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-23
SLIDE 23

Preliminaries

Jobs, Tasks & Local Execution

# launcher ./myprog ./myprog -n 10 ./myprog -n 100

./myprog -n 10 ./myprog CPU 1

Core 2 Core 1

Time

8 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-24
SLIDE 24

Preliminaries

Jobs, Tasks & Local Execution

# launcher ./myprog ./myprog -n 10 ./myprog -n 100

./myprog -n 100 ./myprog -n 10 ./myprog CPU 1

Core 2 Core 1

Time

8 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-25
SLIDE 25

Preliminaries

Jobs, Tasks & Local Execution

# launcher ./myprog ./myprog -n 10 ./myprog -n 100

T1(local) = 100s

./myprog -n 100 ./myprog -n 10 ./myprog CPU 1

Core 2 Core 1

Time

8 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-26
SLIDE 26

Preliminaries

Jobs, Tasks & Local Execution

Job(s)

1 3 Task(s)

Job(s)

1 3 Task(s)

# launcher ./myprog ./myprog -n 10 ./myprog -n 100

T1(local) = 100s

./myprog -n 100 ./myprog -n 10 ./myprog CPU 1

Core 2 Core 1

Time

8 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-27
SLIDE 27

Preliminaries

Jobs, Tasks & Local Execution

# launcher ./myprog ./myprog -n 10 ./myprog -n 100

CPU 1

Core 2 Core 1

Time

8 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-28
SLIDE 28

Preliminaries

Jobs, Tasks & Local Execution

# launcher2 "Run in //:" ./myprog ./myprog -n 10 ./myprog -n 100

CPU 1

Core 2 Core 1

Time

8 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-29
SLIDE 29

Preliminaries

Jobs, Tasks & Local Execution

./myprog -n 10 ./myprog -n 100

T2(local) = 70s

./myprog

# launcher2 "Run in //:" ./myprog ./myprog -n 10 ./myprog -n 100

CPU 1

Core 2 Core 1

Time

8 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-30
SLIDE 30

Preliminaries

Jobs, Tasks & Local Execution

./myprog -n 10 ./myprog -n 100

T2(local) = 70s

./myprog Job(s)

1 3 Task(s)

Job(s)

1 3 Task(s)

# launcher2 "Run in //:" ./myprog ./myprog -n 10 ./myprog -n 100

CPU 1

Core 2 Core 1

Time

8 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-31
SLIDE 31

Preliminaries

Jobs, Tasks & HPC Execution

# launcher ./myprog ./myprog -n 10 ./myprog -n 100 Node 1 CPU 1 Core 2 Core 1 CPU 2 Core 4 Core 3 Node 2 CPU 1 Core 2 Core 1 CPU 2 Core 4 Core 3

Time

9 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-32
SLIDE 32

Preliminaries

Jobs, Tasks & HPC Execution

./myprog -n 10 ./myprog -n 100 T1(hpc) = T8(hpc) = 120s ./myprog # launcher ./myprog ./myprog -n 10 ./myprog -n 100 Node 1 CPU 1

Core 2 Core 1

CPU 2

Core 4 Core 3

Node 2 CPU 1

Core 2 Core 1

CPU 2

Core 4 Core 3

Time

9 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-33
SLIDE 33

Preliminaries

Jobs, Tasks & HPC Execution

./myprog -n 10 ./myprog -n 100 T1(hpc) = T8(hpc) = 120s ./myprog # launcher ./myprog ./myprog -n 10 ./myprog -n 100 Job(s) 1 3 Task(s) Node 1 CPU 1

Core 2 Core 1

CPU 2

Core 4 Core 3

Node 2 CPU 1

Core 2 Core 1

CPU 2

Core 4 Core 3

Time

9 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-34
SLIDE 34

Preliminaries

Jobs, Tasks & HPC Execution

# launcher2 "Run in //:" ./myprog ./myprog -n 10 ./myprog -n 100 Node 1 CPU 1 Core 2 Core 1 CPU 2 Core 4 Core 3 Node 2 CPU 1 Core 2 Core 1 CPU 2 Core 4 Core 3

Time

9 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-35
SLIDE 35

Preliminaries

Jobs, Tasks & HPC Execution

./myprog -n 10 ./myprog -n 100 ./myprog T2(hpc) = 80s # launcher2 "Run in //:" ./myprog ./myprog -n 10 ./myprog -n 100 Node 1 CPU 1

Core 2 Core 1

CPU 2

Core 4 Core 3

Node 2 CPU 1

Core 2 Core 1

CPU 2

Core 4 Core 3

Time

9 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-36
SLIDE 36

Preliminaries

Jobs, Tasks & HPC Execution

./myprog -n 10 ./myprog -n 100 ./myprog T2(hpc) = 80s # launcher2 "Run in //:" ./myprog ./myprog -n 10 ./myprog -n 100 Job(s) 1 3 Task(s) Node 1 CPU 1

Core 2 Core 1

CPU 2

Core 4 Core 3

Node 2 CPU 1

Core 2 Core 1

CPU 2

Core 4 Core 3

Time

9 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-37
SLIDE 37

Preliminaries

Jobs, Tasks & HPC Execution

./myprog -n 10 ./myprog -n 100 ./myprog T8(hpc) = 60s # launcher2 "Run in //:" ./myprog ./myprog -n 10 ./myprog -n 100 Job(s) 1 3 Task(s) Node 1 CPU 1

Core 2 Core 1

CPU 2

Core 4 Core 3

Node 2 CPU 1

Core 2 Core 1

CPU 2

Core 4 Core 3

Time

9 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-38
SLIDE 38

Preliminaries

Local vs. HPC Executions

Context Local PC HPC Sequential T1(local) = 100s T1(hpc) = 120s Parallel/Distributed T2(local) = 70s T2(hpc) = 80s T8(hpc) = 120s T8(hpc) = 60s

10 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-39
SLIDE 39

Preliminaries

Local vs. HPC Executions

Context Local PC HPC Sequential T1(local) = 100s T1(hpc) = 120s Parallel/Distributed T2(local) = 70s T2(hpc) = 80s T8(hpc) = 120s T8(hpc) = 60s Sequential runs WON’T BE FASTER on HPC

֒ → Reason: Processor Frequency (typically 3GHz vs 2.26GHz)

10 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-40
SLIDE 40

Preliminaries

Local vs. HPC Executions

Context Local PC HPC Sequential T1(local) = 100s T1(hpc) = 120s Parallel/Distributed T2(local) = 70s T2(hpc) = 80s T8(hpc) = 120s T8(hpc) = 60s Sequential runs WON’T BE FASTER on HPC

֒ → Reason: Processor Frequency (typically 3GHz vs 2.26GHz)

Parallel/Distributed runs DO NOT COME FOR FREE

֒ → runs will be sequential even if you reserve ≥ 2 cores/nodes ֒ → you have to explicitly adapt your jobs to benefit from the multi-cores/nodes

10 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-41
SLIDE 41

The UL HPC platform

Summary

1 Preliminaries 2 The UL HPC platform Overview Platform Management Tools Monitoring Statistics & Milestones 3 Incoming Milestones and Challenges

11 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-42
SLIDE 42

The UL HPC platform

HPC @ UL http://hpc.uni.lu

12 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-43
SLIDE 43

The UL HPC platform

The UL HPC Team

Pascal Bouvry is a full professor of the FSTC and the head of the ILIAS research unit and the DS-CSCE doctoral school. His team (PCOG) is composed of 25 researchers working on Parallel computing and Optimization applied to Cloud Computing and HPC (scheduling, energy-efficiency, security), Ad-Hoc Networks (Vanets simulation and service optimization) and Biology (gene sequencing, regulatory networks, protein folding). Sébastien Varrette, PhD, is a Research Associate in Prof. Bouvry’s team since 2007. Along with Prof. Bouvry, he defined and set up the global HPC initiative of the UL in 2007. In this context, he is managing the sysadmin team that maintain and extend the platform. In parallel, his research work focuses on Distributed Computing Platforms (clusters, grids or clouds), with a particular interest on the security and performance evaluation of distributed

  • r parallel executions.

Hyacinthe Cartiaux joined the HPC team in 2011 to set up the Grid’5000 Luxembourg site and has since been involved with all the HPC infrastructure of the UL, and other external services such as the Gforge. His interests cover IT automation and devops techniques, HPC & Grid Computing. Valentin Plugaru is an HPC engineer part of the HPC team since 2014. Beginning with 2012 he has collaborated with Prof. Bouvry’s team on research in Energy Efficiency and Performance Evaluation of HPC/Cloud environments. His general interests span R&D in High Performance Computing, Grid and Cloud Computing. Sarah Diehl is a bioinformatician and joined the LCSB BioCore in 2015 as an HPC systems

  • administrator. Her goal is to bridge the gap between researchers and IT specialists. She is

experienced in data management, next-generation sequencing analysis and development of analysis pipelines. 13 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-44
SLIDE 44

The UL HPC platform

UL HPC platforms at a glance (2015)

2 geographical sites, 3 server rooms 4 clusters: chaos+gaia, granduc, nyx.

֒ → 400 nodes, 4284 cores, 49.918 TFlops ֒ → incl. 18 dual [GP]GPU nodes ֒ → 4268.4 TB (raw) shared storage

  • incl. 1.5 PB for backup
  • incl. 1.4 PB (EMC Isilon)

SIU/LCSB/HPC

4 sysadmins

hpc-sysadmins@uni.lu

6,340,316e (Cumul. HW Investment)

since 2007

֒ → Hardware acquisition only ֒ → 4,077,913e (excluding server rooms)

Open-Source software stack

֒ → SSH, LDAP, OAR, Puppet, Modules...

14 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-45
SLIDE 45

The UL HPC platform

HPC server rooms

2009 CS.43 (Kirchberg campus) 14 racks, 100 m2, ≃ 800,000e 2011 LCSB 6th floor (Belval) 14 racks, 112 m2, ≃ 1,100,000e

15 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-46
SLIDE 46

The UL HPC platform

UL HPC Computing Nodes

Date Vendor

  • Proc. Description

#N #C Rpeak chaos 2010 HP Intel Xeon L5640@2.26GHz 2 × 6C,24GB 32 384 3.472 TFlops 2011 Dell Intel Xeon L5640@2.26GHz 2 × 6C,24GB 16 192 1.736 TFlops 2012 Dell Intel Xeon X7560@2,26GHz 4 × 6C, 1TB 1 32 0.289 TFlops 2012 Dell Intel Xeon E5-2660@2.2GHz 2 × 8C,32GB 16 256 4.506 TFlops 2012 HP Intel Xeon E5-2660@2.2GHz 2 × 8C,32GB 16 256 4.506 TFlops chaos TOTAL: 81 1120 14.509 TFlops gaia 2011 Bull Intel Xeon L5640@2.26GHz 2 × 6C,48GB 72 864 7.811 TFlops 2012 Dell Intel Xeon E5-4640@2.4GHz 4 × 8C, 1TB 1 32 0.307 TFlops 2012 Bull Intel Xeon E7-4850@2GHz 16 × 10C,1TB 1 160 1.280 TFLops 2013 Dell Intel Xeon E5-2660@2.2GHz 2 × 8C,64GB 5 80 1.408 TFlops 2013 Bull Intel Xeon X5670@2.93GHz 2 × 6C,48GB 40 480 5.626 TFlops 2013 Bull Intel Xeon X5675@3.07GHz 2 × 6C,48GB 32 384 4.746 TFlops 2014 Delta Intel Xeon E78880@2.5 GHz 8 × 15C,1TB 1 120 2.4 TFlops 2014 SGi Intel Xeon E54650@2.4 GHz 16 × 10C,4TB 1 160 3.072 TFlops gaia TOTAL: 153 2280 26.96 TFlops g5k 2008 Dell Intel Xeon L5335@2GHz 2 × 4C,16GB 22 176 1.408 TFlops 2012 Dell Intel Xeon E5-2630L@2GHz 2 × 6C,24GB 16 192 3.072 TFlops granduc/petitprince TOTAL: 38 368 4.48 TFlops Testing cluster: nyx 2012 Dell Intel Xeon E5-2420@1.9GHz 1 × 6C,32GB 2 12 0.091 TFlops 2013 Viridis ARM A9 Cortex@1.1GHz 1 × 4C,4GB 96 384 0.422 TFlops 2015 HP Intel E3-1284Lv3, 1.8GHz 1 × 4C,32GB 30 120 3.456 TFlops nyx/viridis TOTAL: 128 516 3.969 TFlops 16 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-47
SLIDE 47

The UL HPC platform

UL HPC: General cluster organization

Adminfront Fast local interconnect (Infiniband, 10GbE) Site access server

Site <sitename>

1 GbE

Other Clusters network Local Institution Network

10 GbE 10 GbE 1 GbE

Cluster A

NFS and/or Lustre

Disk Enclosure

Site Shared Storage Area Puppet OAR Kadeploy supervision etc...

Site Computing Nodes Cluster B

Site router

17 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-48
SLIDE 48

The UL HPC platform

Ex: The chaos cluster

Chaos cluster characteristics

  • Computing: 81 nodes, 1124 cores; Rpeak ≈ 14.508 TFlops
  • Storage: 180 TB (NFS) + 180TB (NFS, backup)

LCSB Belval (gaia cluster)

Cisco Nexus C5010 10GbE Bull R423 (2U)

(2*4c Intel Xeon E5630 @ 2.53GHz), RAM: 24GB

Chaos cluster access

Uni.lu

10 GbE IB 10 GbE 10 GbE 1 GbE 10 GbE IB

Adminfront Dell PE R610 (2U)

(2*4c Intel Xeon L5640 @ 2,26 GHz), RAM: 64GB IB

Chaos cluster

Uni.lu (Kirchberg) Infiniband QDR 40 Gb/s (Min Hop) Dell R710 (2U)

(2*4c Intel Xeon E5506 @ 2.13GHz), RAM: 24GB

NFS server NetApp E5486 (180 TB)

60 disks (3 TB SAS 7.2krpm) = 180 TB (raw) Multipathing over 2 controllers (Cache mirroring) 6 RAID6 LUNs (8+2 disks) = 144TB (lvm + xfs) FC8 FC8

CS.43 (416 cores) Computing Nodes

1x HP c7000 enclosure (10U) 32 blades HP BL2x220c G6 [384 cores]

(2*6c Intel Xeon L5640@2.26GHz), RAM: 24GB

1x Dell R910 (4U) [32 cores]

(4*6c Intel Xeon X7560@2,26GHz), RAM:1TB

AS.28 (708 cores) Computing Nodes

1x Dell M1000e enclosure (10U) 16 blades Dell M610 [196 cores]

(2*6c Intel Xeon L5640@2.26GHz), RAM: 24GB

1x Dell M1000e enclosure (10U) 16 blades Dell M620 [256 cores]

(2*8c Intel Xeon E5-2660@2.2GHz), RAM: 32GB

2x HP SL6500 (8U) 16 blades SL230s Gen8 [256 cores]

(2*8c Intel Xeon E5-2660@2.2GHz), RAM: 32GB

18 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-49
SLIDE 49

The UL HPC platform

Ex: The gaia cluster

Lustre Storage Gaia cluster characteristics

  • Computing: 154 nodes, 2024 cores; Rpeak ≈ 21.7 TFlops
  • Storage: 480 TB (NFS) + 240 TB (Lustre) + 884TB (backup)

Kirchberg (chaos cluster) Cisco Nexus C5010 10GbE

Bull R423 (2U) (2*4c Intel Xeon L5620 @ 2,26 GHz), RAM: 16GB Gaia cluster access

Uni.lu

10 GbE IB 10 GbE 10 GbE 1 GbE Bull R423 (2U) (2*4c Intel Xeon L5630@2,13 GHz), RAM: 120GB NFS server #1 Nexsan E60 + E60X (240 TB) 120 disks (2 TB SATA 7.2krpm) = 240 TB (raw) Multipathing over 2+2 controllers (Cache mirroring) 12 RAID6 LUNs (8+2 disks) = 192 TB (lvm + xfs) FC8 FC8 Nexsan E60 (4U, 12 TB) 20 disks (600 GB SAS 15krpm) Multipathing over 2 controllers (Cache mirroring) 2 RAID1 LUNs (10 disks) 6 TB (lvm + lustre) Bull R423 (2U) (2*4c Intel Xeon L5630@2,13 GHz), RAM: 96GB MDS1 MDS2 Bull R423 (2U) (2*4c Intel Xeon L5630@2,13 GHz), RAM: 96GB FC8 FC8 FC8 FC8 Bull R423 (2U) (2*4c Intel Xeon L5630@2,13 GHz), RAM: 48GB OSS1 2*Nexsan E60 (2*4U, 2*120 TB) 2*60 disks (2 TB SATA 7.2krpm) = 240 TB (raw) 2*Multipathing over 2 controllers (Cache mirroring) 2*6 RAID6 LUNs (8+2 disks) = 2*96 TB (lvm + lustre) Bull R423 (2U) (2*4c Intel Xeon L5630@2,13 GHz), RAM: 48GB OSS2 FC8 FC8 FC8 10 GbE IB Adminfront Bull R423 (2U) (2*4c Intel Xeon L5620 @ 2,26 GHz), RAM: 16GB Bull R423 (2U) (2*4c Intel Xeon L5620 @ 2,26 GHz), RAM: 16GB Columbus server IB

Gaia cluster

Uni.lu (Belval) Infiniband QDR 40 Gb/s (Fat tree) LCSB Belval - 151 Computing nodes (2000 cores) 1x BullX BCS enclosure (6U) 4 BullX S6030 [160 cores]

(16*10c Intel Xeon E7-4850@2GHz), RAM: 1TB 1x Dell R820 (4U) [32 cores] (4*8c Intel Xeon E5-4640@2.4GHz), RAM: 1TB 9x Bullx B enclosure (63U) 132 BullX B500 [1584 cores] 60 (2*6c Intel Xeon L5640@2.26GHz) 40 (2*6c Intel Xeon X5670@2.93GHz) 32 (2*6c Intel Xeon X5675@3.07GHz) RAM: 48GB

12032 GPU cores 12032 GPU cores 12032 GPU cores 12032 GPU cores

15 BullX B505 [144 cores] 12 (2*6c Intel Xeon L5640@2.26GHz), RAM: 96GB 12 (2*4c Intel Xeon E5640@2.66GHz), RAM: 24GB 20 GPGPU Accelerator [12032 GPU cores] 4 Nvidia Tesla M2070 [448c] 20 Nvidia Tesla M2090 [512c] 5x Dell R720 (10U) [80 cores] (2*8c Intel Xeon E5-2660@2.2GHz), RAM: 64 GB 5x GPGPU Accelerator [13440 GPU cores] 5 Nvidia K20m [2688c]

12032 GPU cores 12032 GPU cores 12032 GPU cores 12032 GPU cores 12032 GPU cores

Bull R423 (2U) (2*6c Intel Xeon E5-2620@2,2 GHz), RAM: 160GB NFS server #2 Netapp E5400 (240 TB) 60 disks (4 TB SAS 7.2krpm) = 240 TB (raw) Multipathing over 2 controllers 6 RAID6 LUNs (8+2 disks) = 192 TB (lvm + xfs) FC8 FC8

19 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-50
SLIDE 50

The UL HPC platform

Ex: Some racks of the gaia cluster

20 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-51
SLIDE 51

The UL HPC platform

UL HPC Software Stack

Operating System: Linux Debian (CentOS on storage servers) Remote connection to the platform: SSH User SSO: OpenLDAP-based Resource management: job/batch scheduler: OAR (Automatic) Computing Node Deployment:

֒ → FAI (Fully Automatic Installation)

(chaos,gaia,nyx only)

֒ → Puppet ֒ → Kadeploy

(granduc,petitprince/Grid5000 only)

Platform Monitoring: OAR Monika, OAR Drawgantt, Ganglia, Nagios, Puppet Dashboard etc. Commercial Softwares:

֒ → Intel Cluster Studio XE, TotalView, Allinea DDT, Stata etc.

21 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-52
SLIDE 52

The UL HPC platform

HPC in the Grande region and Around

TFlops TB FTEs Country Name/Institute #Cores Rpeak Storage Manpower Luxembourg UL 4284 49.918 4268.4 4 CRP GL 800 6.21 144 1.5 France TGCC Curie, CEA 77184 1667.2 5000 n/a LORIA, Nancy 3724 29.79 82 5.05 ROMEO (GPU*), Reims 5720 254,9 TFlops* 245 2 Germany Juqueen, Juelich 393216 5033.2 448 n/a MPI, RZG 2556 14.1 n/a 5 URZ, (bwGrid),Heidelberg 1140 10.125 32 9 Belgium UGent, VCS 4320 54.541 82 n/a CECI, UMons/UCL 2576 25.108 156 > 4 UK Darwin, Cambridge Univ 9728 202.3 20 n/a Legion, UCLondon 5632 45.056 192 6 Spain MareNostrum, BCS 33664 700.2 1900 14

22 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-53
SLIDE 53

The UL HPC platform

Software/Modules Management

RESIF: Revolutionary EasyBuild-based Software Installation Framework

֒ → Automatic Management of Environment Modules deployment ֒ → Fully automates software builds and supports all available toolchains ֒ → Clean (hierarchical) modules layout to facilitate its usage ֒ → "Easy to use"

Cf Tutorials:

http://ulhpc-tutorials.readthedocs.org/ 23 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-54
SLIDE 54

The UL HPC platform

Software/Modules Management

24 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-55
SLIDE 55

The UL HPC platform

BIO Workflow Management

Galaxy Portal galaxy-server.uni.lu

֒ → web-based platform for data intensive biomedical research.

25 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-56
SLIDE 56

The UL HPC platform

Platform Monitoring

General Live Status

http://hpc.uni.lu/status/overview.html 26 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-57
SLIDE 57

The UL HPC platform

Platform Monitoring

Monika

http://hpc.uni.lu/{chaos,gaia,g5k}/monika 26 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-58
SLIDE 58

The UL HPC platform

Platform Monitoring

Drawgantt

http://hpc.uni.lu/{chaos,gaia,g5k}/drawgantt 26 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-59
SLIDE 59

The UL HPC platform

Platform Monitoring

Ganglia

http://hpc.uni.lu/{chaos,gaia,g5k}/ganglia 26 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-60
SLIDE 60

The UL HPC platform

Platform Monitoring

CDash

http://cdash.uni.lu/ 26 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-61
SLIDE 61

The UL HPC platform

UL HPC Key numbers Summary

400 nodes, 4284 cores, 49.918 TFlops

֒ → Mostly Intel-based architecture ֒ → multi vendors (Bull, HP, Dell, Delta, SGi)

4268.4 TB shared storage

֒ → Based on NetApp / NexSAN / Certon disk enclosures ֒ → Homedirs / Projects: NFS, GPFS, OneFS 2.36 PB ֒ → Scratch Lustre 0.48 PB ֒ → Backup: NFS, GlusterFS 1.4 PB

6,340,316e (Cumul. HW Investment)

since 2007

281 registered users

27 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-62
SLIDE 62

The UL HPC platform

Registered Users

50 100 150 200 250 300 Jan−2008 Jan−2009 Jan−2010 Jan−2011 Jan−2012 Jan−2013 Jan−2014 Number of users Evolution of registered users within UL internal clusters LCSB (Bio−Medicine) URPM (Physics and Material Sciences) FDEF (Law, Economics and Finance) RUES (Engineering Science) SnT (Security and Trust) CSC (Computer Science and Communications) LSRU (Life Sciences) Bachelor and Master students Others

28 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-63
SLIDE 63

The UL HPC platform

CPU-year usage since 2008

CPU-hour: work done by a CPU in one hour of wall clock time

4 13 56 378 612 1067 1417

500 1000 2008 2009 2010 2011 2012 2013 2014

CPU Years

Platform Yearly CPU Used

29 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-64
SLIDE 64

The UL HPC platform

A Year on Gaia...

500 1000 1500 2000 2012−01 2012−07 2013−01 2013−07 2014−01 2014−07

Cores

effective daily usage cores availability

30 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-65
SLIDE 65

The UL HPC platform

Chronological Statistics

31 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-66
SLIDE 66

The UL HPC platform

Chronological Statistics

31 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-67
SLIDE 67

The UL HPC platform

Chronological Statistics

21#850,00#€# 111#168,20#€# 74#773,78#€# 99#998,77#€# 323#478,47#€# 926#777,35#€# 917#647,50#€# 707#338,04#€# 865#458,42#€# #0,00#€# 200#000,00#€# 400#000,00#€# 600#000,00#€# 800#000,00#€# 1000#000,00#€# 2006# 2007# 2008# 2009# 2010# 2011# 2012# 2013# 2014#

UL#HPC#Yearly#Investment,#excl.#Server#rooms#(per#type)#

Other#/#Support# So8ware# Interconnect# Servers# Storage# CompuCng#Nodes#

31 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-68
SLIDE 68

The UL HPC platform

Research Domains and Usage

Research Domains

(Among the 288 registered users)

Research Areas that currently benefit from UL HPC platforms:

֒ → Security

([Ad-Hoc] Network, FT, Grid, Cloud etc.)

֒ → Mechanical Engineering ֒ → Physics, Geo-Physics ֒ → [Multi-Objective] Optimization

[Robust] Task Scheduling etc.

֒ → Cryptology ֒ → Economy ֒ → Life Sciences

32 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-69
SLIDE 69

Incoming Milestones and Challenges

Summary

1 Preliminaries 2 The UL HPC platform Overview Platform Management Tools Monitoring Statistics & Milestones 3 Incoming Milestones and Challenges

33 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-70
SLIDE 70

Incoming Milestones and Challenges

2015 Milestones

cf Newsletter: OS/System upgrade Storage / Portal consolidation

֒ → No way to further extend the HW equipment ֒ → QoS, Establish UL as national HPC Center of Excellence

Coming Soon (/ / / / / / 2015 2016)

Belval Centre De Calcul (CDC)

֒ → 5 new server rooms (3 storage, 2 HPC) ֒ → Pending discussions with Fond Belval to re-justify everything

Obj.: Prepare 2 rooms (1 HPC, 1 Storage) by 2020

֒ → Budget: ≃ 4Me per year

34 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-71
SLIDE 71

Incoming Milestones and Challenges

UL HPC Planning 2015-2020

65,09& 130,18& 442,60& 755,02& 1080,46&

0& 200& 400& 600& 800& 1000& 1200& 2015& 2016& 2017& 2018& 2019& 2020& Compiu'ng*Capacity*[TFlops]*

UL*HPC*Compu'ng*Capacity*Added*in*CDC*S>02*

35 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-72
SLIDE 72

Incoming Milestones and Challenges

UL HPC Planning 2015-2020

2,88$ 6,48$ 10,8$ 15,12$ 19,44$

0$ 5$ 10$ 15$ 20$ 25$ 2015$ 2016$ 2017$ 2018$ 2019$ 2020$ Raw$Storage$Capacity$[PB]$

UL$HPC$Storage$Capacity$Added$in$CDC$S;02$

35 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-73
SLIDE 73

Incoming Milestones and Challenges

UL HPC Planning 2015-2020

Funding CDC is mandatory

֒ → Data Center providers (EBRC, LuxConnect etc.) not adapted

do not have HPC-ready cabinets (80 kW/rack) thus proposed renting price is prohibitive

֒ → Cloud platforms (Amazon etc.) only able to absorb part of the needs

Computational Science initiative

֒ → part of the Digital Strategy for the UL ֒ → pending PRIDE call / Doctoral School etc.

National HPC initiative

֒ → discussion in progress (MECE, MESR) ֒ → Obj: national HPC Center of Excellence (CoE)

36 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-74
SLIDE 74

Incoming Milestones and Challenges

Cost Model

UL HPC Platform funding should evolve

֒ → transition from a free service model to a mixed model

with paying and non-paying tiers

֒ → key for providing HPC services at the national level

You shall also budget your usage upon new project proposal

37 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-75
SLIDE 75

Incoming Milestones and Challenges

Cost Model

UL HPC Platform funding should evolve

֒ → transition from a free service model to a mixed model

with paying and non-paying tiers

֒ → key for providing HPC services at the national level

You shall also budget your usage upon new project proposal

Cost policy

no charge to the actors of the public research sector

֒ → only for internal research projects ֒ → UL Research Units & ICs, LIST, LISER, LIH

the same actors will be charged for externally founded projects

֒ → FNR, European projects, projects with industry

37 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-76
SLIDE 76

Incoming Milestones and Challenges

Cost Model

Pricing units are in the form of usage credits

֒ → under a monthly accounting period.

Two types of credits: 1

1 computing credit of class “X” = 1 CPU core for 1 hour

  • n a resource class “X”

2

1 storage credit = 1 TB of storage for 1 month.

3 storage credits (thus maximum 3TB) for free (each month); Additional credits: 1000e Class Description Credit Price normal Regular HPC resource 0,33 e bigmem Regular HPC resource with huge RAM (≥ 1024 GB) 1,48 e bigsmp SMP node (≥ 16 sockets) with a huge RAM (≥ 1024 GB) 2,45 e

38 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-77
SLIDE 77

Incoming Milestones and Challenges

Computing Credits Prices

0.10 0.33 0.94 2.15 4.20 6.25 8.30 d−cluster1 e−cluster1 h−cluster1 r−cluster1 s−cluster1 gaia−[1−60] gaia−[123−154] gaia−[61−62] gaia−[63−72] gaia−[75−79] gaia−[83−122] gaia−73 gaia−74

Price (Euro)

EC2 Equivalent ULHPC Operating Costs

39 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-78
SLIDE 78

Incoming Milestones and Challenges

Computing Credits Prices

1.1 0.33 7.49 1.48 8.54 2.45 0.0 2.5 5.0 7.5 Normal BigMem BigSMP

Price (Euro)

ULHPC Operating Costs EC2 Equivalent

39 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL

slide-79
SLIDE 79

Thank you for your attention...

Questions?

Sébastien Varrette, PhD mail: sebastien.varrette@uni.lu Office E-007 Campus Kirchberg 6, rue Coudenhove-Kalergi L-1359 Luxembourg UL HPC Management Team mail: hpc-sysadmins@uni.lu

1

Preliminaries

2

The UL HPC platform Overview Platform Management Tools Monitoring Statistics & Milestones

3

Incoming Milestones and Challenges 40 / 40 Sébastien Varrette, PhD (UL HPC, PCOG, CSC Research Unit) High Performance Computing (HPC) at UL