HPC the Easy Way Tools and techniques for making the most of your - - PowerPoint PPT Presentation

hpc the easy way
SMART_READER_LITE
LIVE PREVIEW

HPC the Easy Way Tools and techniques for making the most of your - - PowerPoint PPT Presentation

HPC the Easy Way Tools and techniques for making the most of your resources RSE Sheffield Seminar Series University of Sheffield 30 July 2019 Phil Tooley HPC Application Analyst Experts in numerical software and High Performance Computing


slide-1
SLIDE 1

HPC the Easy Way

Tools and techniques for making the most of your resources

RSE Sheffield Seminar Series University of Sheffield 30 July 2019 Phil Tooley HPC Application Analyst

Experts in numerical software and High Performance Computing

slide-2
SLIDE 2

Outline

Common HPC Problems Using the HPC more efficiently The real world — ShARC HPC Package Managers Conda Spack The POP-COE

slide-3
SLIDE 3

Common HPC Problems

Two common HPC problems

◮ Why is my job still queuing? ◮ How do I install <package>?

slide-4
SLIDE 4

Common HPC Problems

Two common HPC problems

◮ Why is my job still queuing? ◮ How do I install <package>?

slide-5
SLIDE 5

What the Scheduler does

A bin-packing problem

◮ Plans how to map jobs into nodes as efficiently as possible ◮ No job should wait "too long" ◮ Everyone should get a "fair share" ◮ Small jobs fill gaps around big ones

Requested Runtime CPU Slots

slide-6
SLIDE 6

What the Scheduler does

A bin-packing problem

◮ Gaps appear as jobs finish early or are cancelled ◮ Scheduler backfills gaps as best it can ◮ Smaller jobs have more chances to backfill ◮ Ask for only what you actually need

Requested Runtime CPU Slots

slide-7
SLIDE 7

The real world picture - ShARC

Mining the scheduler data

◮ Who is using ShARC? ◮ How are they using it? ◮ How efficiently are they using it?

slide-8
SLIDE 8

The real world picture - ShARC

Mining the scheduler data

◮ Who is using ShARC? ◮ How are they using it? ◮ How efficiently are they using it?

The dataset

◮ Jobs started between 1/7/2017 – 30/6/2018 ◮ Only public node data ◮ Failed jobs removed ◮ Sysadmin test jobs removed

slide-9
SLIDE 9

User Breakdown

50 100 150 200 250 300 350 400 User # 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Cluster time (%)

ShARC usage breakdown per user

◮ 539 unique users ◮ Heaviest 3 users consumed over 50% of available cpu time

slide-10
SLIDE 10

Job breakdown

5000 10000 15000 20000 25000 30000 35000 Job size (core hours) 5 10 15 20 25 30 Cluster time (%)

ShARC usage breakdown by job type

MPI SMP Single Thread

◮ Most time is spent running MPI jobs ◮ ∼ 75% MPI vs. ∼ 25% single node/thread

slide-11
SLIDE 11

Jobs breakdown

Single Thread SMP MPI 10

1

100 101 102 Job volume(%)

ShARC job volume by type

  • Avg. time (min)

Count Single 9.0 5275150 SMP 84.1 32936 MPI 318.9 5878

◮ Huge volume of very short jobs ◮ Heaviest users submitting > 106 short jobs each!

slide-12
SLIDE 12

Jobs breakdown

2 4 6 8 10 Job size (core minutes) 5 10 15 20 25 Fraction of jobs (%)

ShARC usage breakdown by job type

MPI SMP Single Thread

◮ ∼ 50% of ShARC jobs shorter than 1 minute ◮ 50% of scheduler effort spent on only 0.4% of cpu time!

slide-13
SLIDE 13

Runtime Requests and Usage

0.00 0.25 0.50 0.75 1.00 Used runtime fraction 10 20 30 40 50 60 Fraction of Total Jobs (%)

Used fraction of requested runtime

Runlimit specified Default runlimit

◮ Most over-request walltime by at least an order of magnitude ◮ → Lots of missed opportunities to backfill gaps!

slide-14
SLIDE 14

Memory Requests and Usage

0.00 0.25 0.50 0.75 1.00 Used vmem fraction 2 4 6 8 10 12 Fraction of Total Jobs (%)

Used Fraction of Requested Memory

Memlimit specified Default memlimit

◮ Majority of users explicitly request memory ◮ Better usage, but still lots of over-requesting

slide-15
SLIDE 15

Getting Feedback from the Scheduler

Accounting Information

◮ ShARC/Iceberg

$ qacct -j $jobid

◮ Bessemer

$ sacct -j $jobid

◮ Records basic performance information about job

  • Requested resources (time, memory etc.)
  • Actual runtime
  • Actual memory usage
  • Useful CPU time
slide-16
SLIDE 16

Accounting Information

qacct -j 1150879

qname all.q hostname sharc -node147.shef.ac.uk

  • wner

ac1mpt job_number 1150879 submission_time 2018 -04 -16 10:00:43 start_time 2018 -04 -16 10:00:54 end_time 2018 -04 -19 10:34:48 exit_status ru_wallclock 261234 granted_pe mpi slots 220 cpu 57314572.128644 category

  • u ac1mpt -l h_rt =345600 , h_vmem =2G
  • pe mpi 220 -P SHEFFIELD

maxvmem 150.63G

slide-17
SLIDE 17

Resource Rules of Thumb

Runtime

◮ Check ru_wallclock — actual run time ◮ Request 1.5–2× ru_wallclock

slide-18
SLIDE 18

Resource Rules of Thumb

Runtime

◮ Check ru_wallclock — actual run time ◮ Request 1.5–2× ru_wallclock

Memory

◮ Check maxvmem — peak job memory usage ◮ Request 1.5–2× maxvmem ◮ Remember requests are per core

slide-19
SLIDE 19

Resource Rules of Thumb

Runtime

◮ Check ru_wallclock — actual run time ◮ Request 1.5–2× ru_wallclock

Memory

◮ Check maxvmem — peak job memory usage ◮ Request 1.5–2× maxvmem ◮ Remember requests are per core

Efficiency

◮ Check cpu — actual cpu usage ◮ Ensure cpu ≃ ru_wallclock × slots

slide-20
SLIDE 20

Common HPC Problems

Two common HPC problems

◮ Why is my job still queuing? ◮ How do I install <package>?

slide-21
SLIDE 21

Automating Software Installation

Package Managers

◮ Automate installation/removal of software ◮ Manage installation of required dependencies ◮ Curate package repositories ◮ Document and reproduce environments

Focus on just two:

slide-22
SLIDE 22

Conda

Pre-built packages for Python, R, etc.

◮ Originally for Anaconda Python distribution ◮ Microsoft provided R packages ◮ Low level numerical support libraries ◮ Intel Python with MKL optimised Numpy/Scipy ◮ Designed for users to install what they need

slide-23
SLIDE 23

Installing Conda

Personal machine — Windows, Mac, Linux

◮ Two versions: ◮ Anaconda — Full distribution with hundreds of packages ◮ Miniconda — Just Conda and Python ◮ Download from anaconda.com and run installer

slide-24
SLIDE 24

Installing Conda

Personal machine — Windows, Mac, Linux

◮ Two versions: ◮ Anaconda — Full distribution with hundreds of packages ◮ Miniconda — Just Conda and Python ◮ Download from anaconda.com and run installer

ShARC, Bessemer, Iceberg

◮ Already installed:

$ module load conda

slide-25
SLIDE 25

Installing and Managing Packages

Conda Environments

◮ Collections of packages and their dependencies ◮ Isolate individual projects ◮ Test/use multiple versions of a package ◮ Easily capture and reproduce environment elsewhere

slide-26
SLIDE 26

Installing and Managing Packages

Conda Environments

◮ Collections of packages and their dependencies ◮ Isolate individual projects ◮ Test/use multiple versions of a package ◮ Easily capture and reproduce environment elsewhere

Creating Environments

$ conda create

  • -name

myenv numpy pystan $ source activate myenv

slide-27
SLIDE 27

Installing and Managing Packages

Lots of customization options

◮ Choose Python version:

$ conda create

  • -name

myenv numpy pystan python =3.7

slide-28
SLIDE 28

Installing and Managing Packages

Lots of customization options

◮ Choose Python version:

$ conda create

  • -name

myenv numpy pystan python =3.7

◮ Package versions:

$ conda create

  • -name

myenv numpy pystan =2.17.1

slide-29
SLIDE 29

Installing and Managing Packages

Lots of customization options

◮ Choose Python version:

$ conda create

  • -name

myenv numpy pystan python =3.7

◮ Package versions:

$ conda create

  • -name

myenv numpy pystan =2.17.1

◮ Other channels, e.g Intel Python

$ conda create

  • -channel

intel

  • -name

myenv numpy

slide-30
SLIDE 30

Installing and Managing Packages

Lots of customization options

◮ Choose Python version:

$ conda create

  • -name

myenv numpy pystan python =3.7

◮ Package versions:

$ conda create

  • -name

myenv numpy pystan =2.17.1

◮ Other channels, e.g Intel Python

$ conda create

  • -channel

intel

  • -name

myenv numpy

◮ Non Python environments e.g R:

$ conda create

  • -channel r --name

myRenv r rstudio

slide-31
SLIDE 31

Using Environments

Activating and deactivating

◮ “Activate” an environment to use it:

$ conda activate myenv

slide-32
SLIDE 32

Using Environments

Activating and deactivating

◮ “Activate” an environment to use it:

$ conda activate myenv

◮ Installed Packages are now available to use:

$ python Python 3.6.8 (default , Mar 10 2019 , 17:04:16) >>> module load pystan >>> module load numpy >>> # etc ...

slide-33
SLIDE 33

Using Environments

Activating and deactivating

◮ “Activate” an environment to use it:

$ conda activate myenv

◮ Installed Packages are now available to use:

$ python Python 3.6.8 (default , Mar 10 2019 , 17:04:16) >>> module load pystan >>> module load numpy >>> # etc ...

◮ “Deactivate” the environment to exit:

$ conda deactivate

slide-34
SLIDE 34

Using Environments

Installing extra packages

◮ Can add extra packages to the environment

$ conda activate myenv $ conda install scipy scikit -learn #etc ...

◮ And remove unneeded ones

$ conda remove scikit -learn #etc ...

slide-35
SLIDE 35

Using Environments

Installing extra packages

◮ Can add extra packages to the environment

$ conda activate myenv $ conda install scipy scikit -learn #etc ...

◮ And remove unneeded ones

$ conda remove scikit -learn #etc ...

Updating packages

◮ Update all packages to the latest version:

$ conda activate myenv $ conda update

  • -all
slide-36
SLIDE 36

Exporting Environments

Preserving Environments

◮ Export complete list of packages with versions to a file:

$ conda env export

  • -name

myenv > myenv.txt

slide-37
SLIDE 37

Exporting Environments

Preserving Environments

◮ Export complete list of packages with versions to a file:

$ conda env export

  • -name

myenv > myenv.txt

Recreating Environments

◮ Now take that package list to another machine:

$ conda create

  • -name

myenv_clone -f myenv.txt

◮ myenv_clone is now an exact copy of myenv

  • Collaboration with other users
  • Porting to new machines
  • Publishing for reproducibility

◮ Plain text file listing packages — can also be created/edited by

hand

slide-38
SLIDE 38

Conda — Summary

Python and R Package Management

◮ Designed for portability and reproducibility ◮ Rapidly install Python, R etc. packages ◮ Full control of package versioning ◮ Maintain multiple custom package environments ◮ Export, share and duplicate environemnts

slide-39
SLIDE 39

Spack

Build scientific packages from source

◮ Primarily designed for HPC package management ◮ Build optimised packages for specific system ◮ “Recipes” to install over 3000 packages ◮ Interoperates with already installed packages ◮ For sysadmins and end-users

slide-40
SLIDE 40

Installing Spack

Requirements

◮ Python >= 2.6 ◮ A working compiler (gcc, intel, pgi, etc.)

slide-41
SLIDE 41

Installing Spack

Requirements

◮ Python >= 2.6 ◮ A working compiler (gcc, intel, pgi, etc.)

Installation

$ cd $HOME $ git clone https :// github.com/spack/spack.git $ export SPACK_ROOT =" $HOME/spack" $ source $SPACK_ROOT/share/spack/setup -env.sh

◮ Install as user in homedir ◮ Use .bashrc to automatically set up

slide-42
SLIDE 42

Configuring Spack

Compiler autodetection

$ spack compilers ==> Available compilers

  • - gcc sles12 -x86_64
  • gcc@4 .8
slide-43
SLIDE 43

Configuring Spack

Compiler autodetection

$ spack compilers ==> Available compilers

  • - gcc sles12 -x86_64
  • gcc@4 .8

Additional compilers

$ module load gcc /8.1.0 $ spack compiler find ==> Added 1 new compiler: gcc@8 .1.0

slide-44
SLIDE 44

Configuring Spack

System packages

◮ Often want to use some system packages, e.g:

  • Vendor optimised MPI
  • System supplied BLAS/LAPACK
  • Avoid compiling again

◮ Specify in packages.yaml

# /home/phil /. spack/linux/packages.yaml packages: netlib -lapack: modules: lapack /3.8.0 buildable: False

slide-45
SLIDE 45

Installing Packages

Search available packages

$ spack list mpi ==> 21 packages. intel -mpi mpibash mpiblast mpich

  • penmpi

...

slide-46
SLIDE 46

Installing Packages

Search available packages

$ spack list mpi ==> 21 packages. intel -mpi mpibash mpiblast mpich

  • penmpi

...

Install a package

◮ Install “preferred” version

$ spack install

  • penmpi

◮ Specify a version

$ spack install

  • penmpi@2 .1.0
slide-47
SLIDE 47

Spack — Summary

HPC Package Management

◮ A heavy duty package manager ◮ Designed for flexibility and control ◮ Integration with system modules and packages ◮ Full control of package versioning ◮ Build optimised packages from source

slide-48
SLIDE 48

EU H2020 Centre of Excellence (CoE) ) 1 Decemb mber 2018 – 30 November 2021 Grant Agreeme ment No 824080

Par arall allel Perf rformance Optim imization an and Productiv ivit ity

slide-49
SLIDE 49

POP OP CoE

  • A Centre of Excellence
  • On Performance Optimisation and Productivity
  • Promoting best practices in parallel programming
  • Providing FREE Services
  • Precise understanding of application and system behaviour
  • Suggestion/support on how to refactor code in the most productive way
  • Horizontal
  • Transversal across application areas, platforms, scales
  • For (EU) academic AND industrial codes and users !

2

slide-50
SLIDE 50
  • Who?
  • BSC, ES (coordinator)
  • HLRS, DE
  • IT4I, CZ
  • JSC, DE
  • NAG, UK
  • RWTH Aachen, IT Center, DE
  • TERATEC, FR
  • UVSQ, FR

A team with

  • Excellence in performance tools and tuning
  • Excellence in programming models and practices
  • Research and development background AND

proven commitment in application to real academic and industrial use cases

3

Par artner ers

slide-51
SLIDE 51

Why?

  • Complexity of machines and codes

 Frequent lack of quantified understanding of actual behaviour  Not clear most productive direction of code refactoring

  • Important to maximize efficiency (performance, power) of

compute intensive applications and productivity of the development efforts What?

  • Parallel programs, mainly MPI/OpenMP
  • Although also CUDA, OpenCL, OpenACC, Python, …

4

Mo Motiv tivati tion

slide-52
SLIDE 52
  • Parallel Application Performance Assessment
  • Primary service
  • Identifies performance issues of customer code (at customer site)
  • If needed, identifies the root causes of the issues found and

qualifies and quantifies approaches to address them (recommendations)

  • Combines former Performance Audit (?) and Plan (!)
  • Medium effort (1-3 months)
  • Proof-of-Concept (✓)
  • Follow-up service
  • Experiments and mock-up tests for customer codes
  • Kernel extraction, parallelisation, mini-apps experiments to show

effect of proposed optimisations

  • Larger effort (3-6 months)

Note: Effort shared between our experts and customer!

FREE Servic ices pro rovi vided by the CoE

slide-53
SLIDE 53

The Process …

When? December 2018 – November 2021 How?

  • Apply
  • Fill in small questionnaire

describing application and needs https://pop-coe.eu/request-service-form

  • Questions? Ask pop@bsc.es
  • Selection/assignment process
  • Install tools @ your production machine (local, PRACE, …)
  • Interactively: Gather data → Analysis → Report

5