HPC Cloud Floris Sluiter SARA computing & networking services

About SARA, NCF and BiG Grid The foundation for National Compute Facilities is part of ● NWO, the Dutch Government Organization for Scientific Research The BiG Grid project is a collaboration between NCF, Nikhef ● and NBIC, and enables access to grid infrastructures for scientific research in the Netherlands. SARA is a national High Performance Computing and e- ● Science Support Center, in Amsterdam and the primary operational partner of BiG Grid

SARA Project involvements

SARA Scientific Infrastructure and support High Performance Huygens, GPU cluster Computing Lisa, Grid. Hadoop, HPC Cloud High Resolution Tiled Panel Display Visualization Remote Visualization High Performance SURFnet 6 Networking AMSix Netherlight Mass Storage 2*10 Petabyte Tape archive 4 Petabyte disk storage

Scientific Computing facilities SARA (Specs) Huygens National Super GPU Cluster (part of LISA) Power6, 3328 cores in 105 nodes Tesla GPU 2000 cores in 8 nodes 15.25 TB of memory, 32 Gbyte memory (total for GPU) Infiniband 160 Gbit/s Infinband 20Gbit/s 700 TB of disk space, 2 Tbyte disk space 60 TFlop/s 7 Tflop/s LISA National Compute Cluster HPC Cloud (expected) Intel, 4480 cores in 512 nodes, AMD/Intel 512 cores in 16 flexibel nodes 4 TB memory 12 TB of memory, 8*10 Gbit/s Ethernet Infiniband 20Gbit/s 300 TB disk space 50Tbyte disk space 10K specints 20 TFlop/s Grid Resources Innovative Infrastructure Intel, 2400 Cores in 2400 nodes Hadoop 5 TB memory CDMI 125 Mbit/s (1 Gbit/s burst) Ethernet Webdav, iRODS 3.5 PB of disk space, 4 PB tape ClearSpeed 30K specints

HPC Cloud Philosophy HPC Cloud Computing: Self Service Dynamically Scalable Computing Facilities Cloud computing is not about new technology, it is about new uses of technology

HPC Cloud: Concepts HPC cloud Clone my laptop!! One Environment, Same image HPC Hardware Broom closet ● Images: No overcommitting cluster ● (reserved resources) - Software Secured environment and network ● - Libraries User is able to fully control their resource ● - Batch systeem Laptop (VM start, stop, OS, applications, - resource allocation) Develop together with users ●

...At AMAZON? Cheap? ● – Quadruple Extra Large = 8cores and 64Gb ram: $2.00/h (or $5300/y + $0.68/h) – 1024 cores = $2.242.560/y (or $678k + $760k = $1.400k/y) Bandwidth = extra ● Storage = extra ● I/O guarantees? ● Support? ● Secure (no analysis/forensics)? ● High Performance Computing? ●

Users of Scientific Computing High Energy Physics ● Atomic and molecular ● physics (DNA); Life sciences (cell biology); ● Human interaction (all ● human sciences from linguistics to even phobia studies) from the big bang; ● to astronomy; ● science of the solar ● system; earth (climate and ● geophysics); into life and biodiversity. ● Slide courtesy of prof. F. Linde, Nikhef

(current) Users of HPC Cloud Computing High Energy Physics ● Atomic and molecular ● physics (DNA); Life sciences (cell biology); ● Human interaction (all ● human sciences from linguistics to even phobia studies) from the big bang; ● to astronomy; ● science of the solar ● system; earth (climate and ● geophysics); into life and biodiversity. ● Slide courtesy of prof. F. Linde, Nikhef

HPC (Cloud) Application types Type Examples Requirements Compute Intensive Monte Carlo simulations and CPU Cycles parameter optimizations, etc Data intensive Signal/Image processing in I/O to data (SAN Astronomy, Remote Sensing, File Servers) Medical Imaging, DNA matching, Pattern matching, etc Communication Particle Physics, MPI, etc Fast interconnect intensive network Memory intensive DNA assembly, etc Large (Shared) RAM Continuous services Databases, webservers, Dynamically webservices scalable

The product: Virtual Private HPC Cluster ● We (plan to) offer: ● Fully configurable HPC Cluster (a cluster from scratch) ● Fast CPU ● Large Memory (64GB/8 cores) ● High Bandwidth (10Gbit/s) ● Large and fast storage (400Tbyte) ● Users will be root inside their own cluster ● Free choice of OS, etc Platform and tools: ● And/Or use existing VMs: Redmine collaboration portal ● Examples, Templates, Clones of Custom GUI (Open Source) ● Open Nebula + custom add-ons ● Laptop, Downloaded VMs, etc CDMI storage interface ● ● Public IP possible (subject to security scan)

Physical architecture ( testbed)

Virtual architecture

Virtual architecture cont...

Virtual architecture User view

Project Development Goals ROADMAP Physical Architecture ● 1) SARA Innovation project in 2009, HPC Cloud needs High I/O capabilities ● Performance tuning: optimize hard- & 2) Pre-production for BiGGrid in 2010 ● software Scheduling ● 3) In 2011 (summer) Production Usability ● Infrastructure Interfaces ● Templates 4) Development continues ● 2011/2012 Documentation & Education ● Involve users in pre-production (!) ● Security ● Protect user against self, fellow users, the ● world and vice versa! Enable user to share private data and ● templates Self Service Interface ● User specifies “normal network traffic”, ACLs & ● Firewall rules Monitoring, Monitoring, Monitoring! ● No control over contents of VM ● monitor its ports, network and communication ● patterns

A bit of Hard Labour

User collaboration Portal • Redmine (www.redmine.org)

Self Service GUI Developed at SARA Open Source, available at www.opennebula.org 22

Monitoring workload

Standards: OCCI + CDMI + OVF + CNMI = CMI

Development plans/effort @ SARA • Storage • GUI – CDMI server application – New & improved on OCCI/CDMI • Network • Security – Dynamic provisioning – Flow analysis – QoS – Dynamic ACL/Firewall – ACL/Firewall rules – Dynamic DNS – “CNMI” – Network benchmarking • Compute – OCCI server with AAA?

CDMI server + client • CDMI server (to be released 2011) – Backend = Linux, posix complient – ACLs mapped on groups – C++ – Will be open source (License pending) – REST Http (objects pending) – All features except queues • CDMI client (released 2010) – FUSE – C++ – Open source (GPL)

Real world network virtualization tests with qemu/KVM • 20 gbit/s DDR infiniband (IPoIB) is compared with 1 Gbps Ethernet and 10 Gbps Ethernet • Virtual network bridged to physical (needed for user separation) • "real-world" tests performed on non optimized system • Results – 1GE: 0,92 Gbps (1 Gbs) – IpoIB: 2,44 Gbps(20Gbs) – 10GE: 2,40 Gbps (10Gbs) • Bottleneck: virtio driver • Likely Solution: SRIOV • Full report on www.cloud.sara.nl

HPC Cloud Floris Sluiter SARA computing & networking services - PowerPoint PPT Presentation

HPC Cloud Floris Sluiter SARA computing & networking services About SARA, NCF and BiG Grid The foundation for National Compute Facilities is part of NWO, the Dutch Government Organization for Scientific Research The BiG Grid project is

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

HPC Cloud A tool for research Floris Sluiter Project leader SARA computing & networking

HPC Cloud Interactive User support Floris Sluiter Project leader SARA computing &

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

THE MARRIAGE OF CLOUD, HPC AND CONTAINERS ...AND SERVERLESS? ADAM HUFFMAN Senior HPC and Cloud

MarFS : A Scalable Near-POSIX File System over Cloud Objects Kyle E. Lamb HPC Storage Team Lead

CONTAINERS DEMOCRATIZE HPC CJ Newburn, Principal Architect for HPC, NVIDIA GTC19 S9525 -

Computer Security Summer Scholars 2016 Ma7 Vander Werf HPC System Administrator Security in HPC

Building a Grid System for HPC HPC on Grid High Performance Computing (HPC): Use of computer

HPC IN EUROPE Organisation of public HPC resources Context Focus on publicly-funded HPC

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, PhD.

Ne Networ ork M Measur urem emen ent f for ND NDN Davide Pesavento , Omar Ilias El Mimouni,

RIB Size Estimation for BGPSEC Trustworthy Networking Program K. Sriram (with O. Borchert, O.

Open Security Controls Assessment Language (OSCAL) Lunch with the OSCAL Developers David

Long Range Wireless IoT Technologies: Low Power Wide Area Network (LPWAN) vs Cellular ______

Math 211 Math 211 Lecture #32 Harmonic Motion November 10, 2003 2 The Vibrating Spring The

Lecture Outline Regeltechniek Previous lecture: representation of dynamic models, transfer func-

Community Development Block Grant National Disaster Resilience (CDBG- NDR) Competition NOFA

Strategies for the incremental inference of majority-rule sorting models

Sambuz

Useful Links

Newsletter

Mail Us