HPC-SIG Ecosystem Validation Jan. 14 2019 Baptiste Gerondeau - - PowerPoint PPT Presentation

hpc sig ecosystem validation
SMART_READER_LITE
LIVE PREVIEW

HPC-SIG Ecosystem Validation Jan. 14 2019 Baptiste Gerondeau - - PowerPoint PPT Presentation

HPC-SIG Ecosystem Validation Jan. 14 2019 Baptiste Gerondeau Renato Golin HPC-SIG Lab and Validation Matrix Aggregate machines in the same infrastructure, and validate their performance using a Validation Matrix Validation Matrix must be


slide-1
SLIDE 1

HPC-SIG Ecosystem Validation

  • Jan. 14 2019

Baptiste Gerondeau Renato Golin

slide-2
SLIDE 2

For more info visit

linaro.org/hpc

HPC-SIG Lab and Validation Matrix

Aggregate machines in the same infrastructure, and validate their performance using a Validation Matrix

  • Validation Matrix must be applicable to every machine
  • Validation Matrix dimensions are software configurations

To generate as few tests as possible, we need to simplify the matrix without losing information

slide-3
SLIDE 3

For more info visit

linaro.org/hpc

HPC-SIG Lab’s Infrastructure

The infrastructure needs to :

  • Dispatch jobs (tests, provisioning, benchmarks)
  • Provide DHCP/TFTP services
  • Provide Package Cache services
  • Provide a secure file/results storage service
  • Be Low Maintenance
  • Be able to be replicated anywhere else
slide-4
SLIDE 4

For more info visit

linaro.org/hpc

Simplifying Infrastructure

Identifying the different dimensions

A Vertical Slice of the Stack Principal dimensions : ➔ Application ➔ HPC environment stack ➔ Machine provisioning

  • HPC Stack : OpenHPC
  • Validation Application :

OpenHPC’s testsuite

slide-5
SLIDE 5

For more info visit

linaro.org/hpc

Simplifying Infrastructure

Identifying the different dimensions

The Stack from the Lab’s point of view Machine provisioning : ➔ Network configuration ➔ Kernel ➔ OS ➔ HPC Stack

  • Multiple ways to do the

provisioning

slide-6
SLIDE 6

For more info visit

linaro.org/hpc

Simplifying Infrastructure

Identifying the different dimensions

Provisioning Method Variations Multiple ways to provision : ➔ Warewulf Stateless (VNFS) ➔ Warewulf Stateful (OS image) ➔ Ansible

slide-7
SLIDE 7

For more info visit

linaro.org/hpc

Simplifying Infrastructure

Identifying the different dimensions

Different Network Layouts

  • Flat : Machines reachable from anywhere
  • Tree: Machines reachable from cluster

head node only

  • Root : Master with DHCP/TFTP server
slide-8
SLIDE 8

For more info visit

linaro.org/hpc

Simplifying Infrastructure

Identifying the different dimensions

Different Kernels

  • Upstream from OS
  • ERP : Entreprise Reference

Platform

  • Contains support for

platforms in the process of being upstreamed

slide-9
SLIDE 9

For more info visit

linaro.org/hpc

Simplifying Infrastructure

Identifying the different dimensions

Different Operating Systems

  • 3 OSes available to the user
  • No Debian support in OpenHPC
slide-10
SLIDE 10

For more info visit

linaro.org/hpc

Simplifying Infrastructure

Abstractions, and the user’s environment

Abstracting Network Variations

  • Invisible to the user
  • Handled by the lab installer
  • Dependent on hardware
slide-11
SLIDE 11

For more info visit

linaro.org/hpc

Simplifying Infrastructure

Abstractions, and the user’s environment

Abstracting Provisioning Variations

  • Multi-staged provisioning
  • Coexistence
  • Dependent on hardware
slide-12
SLIDE 12

For more info visit

linaro.org/hpc

Simplifying Infrastructure

Abstractions, and the user’s environment

Abstracting Environment Variations

  • Control over HPC Stack
  • Common OS configuration
  • Idempotency
  • Package Caches
slide-13
SLIDE 13

For more info visit

linaro.org/hpc

Simplifying Infrastructure

Abstractions, and the user’s environment

Accounting for extra HPC services

  • Infiniband Support
  • Lustre server support
  • Future additional features

(additional hardware)

slide-14
SLIDE 14

For more info visit

linaro.org/hpc

Simplifying Infrastructure

What the User sees, configures

The Lab’s Interface ➔ Choose Application ❖ Lab picks default configuration ❖ User fine tunes configuration

slide-15
SLIDE 15

For more info visit

linaro.org/hpc

Validation matrix

Cluster Deployment

slide-16
SLIDE 16

For more info visit

linaro.org/hpc

Validation matrix

Distributed Applications Enablement

slide-17
SLIDE 17

For more info visit

linaro.org/hpc

Validation matrix

Toolchain Benchmarking

slide-18
SLIDE 18

For more info visit

linaro.org/hpc

Validation matrix

Library Enablement and Enhancement

slide-19
SLIDE 19

For more info visit

linaro.org/hpc

Future

  • Vendors to rely on Linaro for base OSS validation

○ We have multiple vendors available ○ On a standardised infrastructure

slide-20
SLIDE 20

For more info visit

linaro.org/hpc

Future

  • Vendors to rely on Linaro for base OSS validation

○ We have multiple vendors available ○ On a standardised infrastructure

  • Share our work

○ OpenHPC Ansible recipes (with the OpenHPC community) ○ SDI (MrP, Jenkins, Ansible) helping members to replicate our work ○ Community CI (OpenHPC test-suite, MPI MTT, OpenMP tests, OpenBLAS CI)

slide-21
SLIDE 21

For more info visit

linaro.org/hpc

Future

  • Vendors to rely on Linaro for base OSS validation

○ We have multiple vendors available ○ On a standardised infrastructure

  • Share our work

○ OpenHPC Ansible recipes (with the OpenHPC community) ○ SDI (MrP, Jenkins, Ansible) helping members to replicate our work ○ Community CI (OpenHPC test-suite, MPI MTT, OpenMP tests, OpenBLAS CI)

  • Allow our engineers to develop the ecosystem

○ Internal tests and benchmarks (via Jenkins, no infrastructure knowledge needed) ○ Testing new packages, libraries, compilers (comparison jobs, CI results, statistic analysis)

slide-22
SLIDE 22

For more info visit

linaro.org/hpc

Future

  • Vendors to rely on Linaro for base OSS validation

○ We have multiple vendors available ○ On a standardised infrastructure

  • Share our work

○ OpenHPC Ansible recipes (with the OpenHPC community) ○ SDI (MrP, Jenkins, Ansible) helping members to replicate our work ○ Community CI (OpenHPC test-suite, MPI MTT, OpenMP tests, OpenBLAS CI)

  • Allow our engineers to develop the ecosystem

○ Internal tests and benchmarks (via Jenkins, no infrastructure knowledge needed) ○ Testing new packages, libraries, compilers (comparison jobs, CI results, statistic analysis)

HPC Lab Setup https://github.com/Linaro/hpc_lab_setup Ansible OpenHPC installation recipe : https://github.com/Linaro/ansible-playbook-for-ohpc

slide-23
SLIDE 23

Thanks!