Dynamic provisioning and execution of HPC workflows using Python - - PowerPoint PPT Presentation

dynamic provisioning and execution of hpc workflows using
SMART_READER_LITE
LIVE PREVIEW

Dynamic provisioning and execution of HPC workflows using Python - - PowerPoint PPT Presentation

Dynamic provisioning and execution of HPC workflows using Python Chris Harris, Patrick OLeary, Michael Grauer, Aashish Chaudhary, Chris Kotfila and Robert OBara Overview Motivation HPC Workflows HPC Resources


slide-1
SLIDE 1

Dynamic provisioning and execution

  • f HPC workflows using Python

Chris Harris, Patrick O’Leary, Michael Grauer, Aashish Chaudhary, Chris Kotfila and Robert O’Bara

slide-2
SLIDE 2

Overview

  • Motivation
  • HPC Workflows
  • HPC Resources
  • Cluster provisioning
  • Data management
  • Job submission
  • Workflow orchestration
  • Result/Applications
  • Conclusion
slide-3
SLIDE 3

Motivation

  • HPC workflows have enabled significant research advances
  • Barriers to widespread adoption remain

○ Complex to use ○ Require specialist local expertise ○ Expensive dedicated hardware

slide-4
SLIDE 4

Cumulus

  • Platform for dynamic provisioning and execution of HPC workflows
  • Intended to make HPC workflows more accessible to developers
  • Key functionality

○ Cluster provisioning ○ Data management ○ Job submission ○ Workflow orchestration

slide-5
SLIDE 5

HPC Workflows

  • Are tasks executed in order to carry out some computation on a HPC

resource

  • Jobs running on HPC resources

○ Simulation code ○ Data processing

  • Auxiliary task run outside HPC resources

○ Transferring input data to HPC resource ○ Post-processing of results

slide-6
SLIDE 6

HPC Resources

  • “Traditional” HPC Resources

○ Dedicated hardware using sophisticated interconnects

  • “Dynamic” HPC Resources

○ Built on demand from virtual server in public or private cloud ■ AWS EC2 ■ OpenStack ○ Size and characteristics tailored to workflow ○ Only pay for what you use ○ Interconnects are significantly slower

slide-7
SLIDE 7

Design principles

  • Hide complexity associated with HPC workflows

○ Application development rather than infrastructure

  • Allow workflows to be portable across HPC resources
  • Expose RESTful endpoints

Language agnostic for clients

slide-8
SLIDE 8

Cluster provisioning

  • Launch and provision dynamic clusters tailored to a specific workflow
  • Process composed of two steps

○ Launching ○ Runtime Provisioning

  • Ansible

○ Automation tool for system configuration and software deployment ○ Declarative operations defined through ■ Reusable roles ■ Use case specific playbooks

slide-9
SLIDE 9

Cluster provisioning - Launching

  • Creating the virtual servers in the cloud environment

○ Tailor machine type and cluster size

  • Machine images

○ Template from which virtual servers are created ○ Base operating system and software ○ Workflow specific images ■ Pre-installed software stack ■ Reproducible environment ■ Reduce cluster startup time

slide-10
SLIDE 10

Cluster provisioning - Runtime provisioning

  • Runtime configuration

○ E.g. configuration involving network topology

  • Built-in support for MPI environment using SGE
  • Additional playbooks can be added

○ E.g. Apache Spark.

slide-11
SLIDE 11

Data management

  • HPC workflows are data driven

○ Cluster and input configurations ○ Output dataset ○ Performance statistics

  • Appropriate access controls needed
  • Girder

○ Open-source web-based data management platform ○ Exposes RESTful endpoint ○ Provides cumulus with three key pieces of functionality ■ Data organization and access ■ User management and authentication ■ Authorization management

slide-12
SLIDE 12

Job submission

  • Cumulus using conventional job schedulers

○ SGE, PBS and Slurm (+NEWT)

  • Provides a scheduler provides abstraction
  • Access to HPC resources through SSH

○ Key-based authentication ○ Provides a secure and standard interface to a variety of ■ Public and private traditional HPC resources ■ Cloud based HPC resources

slide-13
SLIDE 13

Workflow orchestration

  • Efficient and scalable

○ Workflows are potentially very long lived ○ Consume minimal resources while monitoring HPC jobs

  • Combines the cluster provisioning, data management and job

submission into a workflow

  • Workflow topology

○ Simple linear flows ○ Complex flows containing branches and loops

slide-14
SLIDE 14

Workflow orchestration - TaskFlow

  • TaskFlow - A simple yet powerful workflow engine built on Celery
  • Celery

○ Open-source asynchronous task queue ○ Tasks are simple Python functions ○ Simple linear scaling

slide-15
SLIDE 15

Applications - HPCCloud

  • Web-based simulation environment

○ High-level workflows ○ Simple intuitive web UI

  • Motivated Cumulus development
  • Implements a number of workflows

○ PyFR simulations ○ ParaViewWeb visualization

slide-16
SLIDE 16

Applications - ModelBuilder

  • Computational Model Builder (CMB) framework

○ Advanced simulation workflows on the desktop

  • Multiphysics workflows

○ Particle accelerator simulations

  • Qt desktop application

○ API validation in non-web environment

slide-17
SLIDE 17

Conclusion

  • Cumulus is a novel platform for developing end-to-end HPC workflows

○ Targeting traditional and cloud-based HPC resources

  • The platform provides

○ Cluster provisioning ○ Data management ○ Job submission ○ Workflow orchestration

  • Its capabilities have been demonstrated in a variety of end-user applications