Behind the scenes of a FOSS-powered HPC cluster at UCLouvain - - PowerPoint PPT Presentation

behind the scenes of a foss powered hpc cluster at
SMART_READER_LITE
LIVE PREVIEW

Behind the scenes of a FOSS-powered HPC cluster at UCLouvain - - PowerPoint PPT Presentation

Behind the scenes of a FOSS-powered HPC cluster at UCLouvain Ansible or Salt? Ansible AND Salt! Behind the scenes of a FOSS-powered HPC cluster at UCLouvain Damien Franois | Universit catholique de Louvain - CISM FOSDEM '18 | HPC, Big Data


slide-1
SLIDE 1

Behind the scenes of a FOSS-powered HPC cluster at UCLouvain

Behind the scenes of a FOSS-powered HPC cluster at UCLouvain

Ansible or Salt? Ansible AND Salt!

Damien François | Université catholique de Louvain - CISM FOSDEM '18 | HPC, Big Data & Data Science Devroom | 2018-02-04

slide-2
SLIDE 2

UCL, CISM

slide-3
SLIDE 3

Manneback cluster

grows organically ; 1 to 10 machines at a time now 4000 cores, Gb+10Gb, 50TB storage 100 local users + CMS grid, ~2 M jobs per year

slide-4
SLIDE 4

make persistent make actionable make idempotent check-list shell script config. management

We started “manually”... ... and gradually improved automation

slide-5
SLIDE 5

We settled on three tools for the deployment of new nodes

slide-6
SLIDE 6

Unboxing

  • Label, rack, connect
  • Choose Name, IP
  • Gather MAC
  • 1. Deploy
  • 2. Integrate
  • 3. Confgure

Deploy operating system Setup SSH key for Ansible Configure and start Salt minion Get inventory from Salt or Cobbler Setup RSA keys for Salt Register node to services Prepare confguration fles Install software Install/update software Broadcast confguration

Ready for jobs

slide-7
SLIDE 7

Wrapper for PXE, TFTP, DHCP servers Manage OS images, machine profiles Install operating system Setup hardware-specifc confguration (disk partitions, NICs, IPMI, etc.) Setup minimal confguration (Admin SSH keys, Salt minion)

“Cobbler is a Linux installation server that allows for rapid setup of network installation environments.” http://cobbler.github.io

slide-8
SLIDE 8

Shell scripts on steroïds with builtin safety, idempotence, APIs One-off operations register to Zabbix, GLPI, Salt build files: slurm.conf for Slurm, /etc/hosts for dnsmasq, /etc/ssh/ssh_known_hosts for hostbased SSH, .dsh/group/all for pdsh create CPU-specific directory for Easybuild

“Ansible seamlessly unites workflow orchestration with configuration management, provisioning, and application deployment in one easy-to-use and deploy platform.” https://www.ansible.com

slide-9
SLIDE 9

“Scalable, flexible, intelligent IT orchestration and automation” https://saltstack.com

Central configuration management server Daily management configure system: LDAP, NTP, DNS, Slurm, etc. install admin software mount user filesystem (home, scratch, software)

slide-10
SLIDE 10

Unboxing

  • Label, rack, connect
  • Choose Name, IP
  • Gather MAC
  • 1. Deploy
  • 2. Integrate
  • 3. Confgure

Deploy operating system Setup SSH key for Ansible Configure and start Salt minion Get inventory from Salt or Cobbler Setup RSA keys for Salt Register node to services Prepare confguration fles Install software Install/update software Broadcast confguration

if new CPU architecture -> Easybuild if new Slurm QOS for specifc users -> Sluf Ready for jobs

slide-11
SLIDE 11
  • 1. Deploy
  • 2. Setup
  • 3. Manage

Deploy operating system Install software Pre-seed data Install/update software Manage configuration

More generally:

slide-12
SLIDE 12
  • 1. Deploy
  • 2. Setup
  • 3. Manage

Deploy operating system Install software Pre-seed data Install/update software Manage configuration

More generally:

slide-13
SLIDE 13
  • 1. Deploy
  • 2. Setup
  • 3. Manage

Deploy operating system Install software Pre-seed data Install/update software Manage configuration

More generally:

slide-14
SLIDE 14
  • 1. Deploy
  • 2. Setup

Deploy operating system Install software Pre-seed data

Typical development platform:

  • ur laptops
slide-15
SLIDE 15
  • 2. Setup
  • 3. Manage

Install software Pre-seed data Install/update software Manage configuration

Typical staging platform:

  • ur test mini-cluster
slide-16
SLIDE 16
  • 1. Deploy
  • 2. Setup

Prod

  • 1. Deploy
  • 2. Setup
  • 3. Manage

Stage Dev Same playbooks Same server

  • 2. Setup
  • 3. Manage

Install software Pre-seed data Install/update software Manage configuration

slide-17
SLIDE 17

Some features overlap

(e.g. install soft)

if soft.is_specific(“dev”): #e.g. VB guest additions vagrant.provision().install(soft) elif soft.is_specific(“hardware”): #e.g. drivers cobbler.kickstart().install(soft) elif soft.is_useful() in [“stage”, “prod”]: #e.g. (e.g. zabbix-agent) salt.install(soft) else: # needed through all the chain (e.g. slurm) ansible.install(soft)

slide-18
SLIDE 18

Gotcha's

Uploading a file in Ansible and in Salt:

slide-19
SLIDE 19

Gotcha's

Uploading a file in Ansible and in Salt: Installing a package in Ansible and in Salt:

slide-20
SLIDE 20

What we love about...

  • Python, YAML, Jinja, the plethora of modules
  • Declarative style; very powerful, handle complex dependencies,
  • Pull: handle nodes down when they come back up, etc.
  • Single source of truth, traceability, provenance, accountability
  • Scalability, syndication; manages the whole infrastructure
  • Out-of-band management (second entry point)
  • Python, YAML, Jinja, the plethora of modules
  • Imperative style; simple to grasp, playbook easy to read, easy

to share, easy to reuse in different contexts

  • Effective for manual/emergency frefghting
  • In-band management, standalone (no need for agent, uses SSH)
slide-21
SLIDE 21

LDAP SSH Slurm File syst. User env. ... Preparing for a new user

slide-22
SLIDE 22

LDAP SSH Slurm File syst. User env. ... Slufl

Daemon that runs Ansible playbooks when LDAP entries change

slide-23
SLIDE 23

Custom Salt grain for Slurm

top.sls

slide-24
SLIDE 24

Ansible and Salt work very well together Complementary Same building bricks Along with Cobbler, nice team to manage an

  • rganically-growing Tier-2 compute cluster
slide-25
SLIDE 25

pdsh, clustershell, sshuttle, pandoc

slide-26
SLIDE 26
slide-27
SLIDE 27

Behind the scenes of a FOSS-powered HPC cluster at UCLouvain

Behind the scenes of a FOSS-powered HPC cluster at UCLouvain

Cobbler, Ansible and Salt!

damien.francois@uclouvain.be @damienfrancois on Twitter, Linkedin, StackOverfow, GitHub