Maestro Workflow Conductor: A vision for the future of HPC Workflow - - PowerPoint PPT Presentation

maestro workflow conductor a vision for the future of hpc
SMART_READER_LITE
LIVE PREVIEW

Maestro Workflow Conductor: A vision for the future of HPC Workflow - - PowerPoint PPT Presentation

Maestro Workflow Conductor: A vision for the future of HPC Workflow Computing Expo Francesco Di Natale Software Engineer Maestro Project Lead Computer Scientist (ASQ) September 30, 2020 LLNL-PRES-810817 This work was performed under the


slide-1
SLIDE 1

LLNL-PRES-810817

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE- AC52-07NA27344. Lawrence Livermore National Security, LLC

Maestro Workflow Conductor: A vision for the future of HPC Workflow

Computing Expo

Francesco Di Natale Software Engineer Maestro Project Lead Computer Scientist (ASQ) September 30, 2020

slide-2
SLIDE 2

What is Maestro? What can Maestro do?

slide-3
SLIDE 3

3

LLNL-PRES-810817

§ Automation of multi-step computational workflows both locally and on

supercomputers

— A parameter sweep of a simulation model (setup, simulate, post-process)

§ Parses a human-readable specification that is self-documenting and portable from one

user and environment to another

§ Makes it easy to setup and run computational based studies by abstracting away the

details of running on HPC clusters

§ The core design tenants of Maestro focus on:

— encouraging clear workflow communication and documentation — consistent execution allowing users to more easily focus on science

Maestro Workflow Conductor is an open-source HPC software tool and library that automates software processes

slide-4
SLIDE 4

4

LLNL-PRES-810817

Maestro handles core functions of running a user’s workflow

  • 1. Run submission and monitoring

Maestro submits, monitors, and restart jobs. Maestro can also manage the amount of jobs submitted to the scheduler at a given time.

  • 2. Workspace management

Maestro manages the study workspace creating files and ensuring data doesn’t overwrite steps/studies.

  • 3. Workflow Provenance

Maestro captures workflow provenance of what is run including the sampled parameters, study spec, and inputs.

slide-5
SLIDE 5

5

LLNL-PRES-810817

Maestro centers around the concept of studies for defining step- wise workflows

§ A list of steps with their dependencies specified § Parameters to apply to the list of steps § Fixed value substitutions (variables) § A study specification is a documented artifact of

a user workflow that can be run and repeated

§ A user can write a study by hand or write a

programs to algorithmically generate study specifications.

slide-6
SLIDE 6

6

LLNL-PRES-810817

description: name: Hello_World description: Say hi to everyone! study:

  • name: say-hi

description: Echo hello, world to a file. run: cmd: | echo "Hello, world!" > hi.txt depends: []

A simple “Hello World” Maestro study specification.

To run ”hello.yaml”, simply execute the command line “maestro run hello.yaml”

Hello_World

Study overview User specified steps to be executed Maestro DAG

say-hi

slide-7
SLIDE 7

7

LLNL-PRES-810817

description: name: Hello_World description: Say hi to everyone! study:

  • name: say-hi

description: Echo a friendly greeting. run: cmd: | echo "Hello, $(NAME)!" > hi_$(NAME).txt depends: [] global.parameters: NAME: values: [”Jim”, ”Kelly”, “Michael”, “Pam”] label: NAME.%%

Adding a parameter to a study is straight-forward, simple, and easy. Study overview User specified parameters User specified steps to be executed

Hello, Kelly Hello, Jim Hello, Michael Hello, Pam

Hello_World

A simple “Hello World” Maestro study specification.

slide-8
SLIDE 8

How is Maestro designed?

slide-9
SLIDE 9

9

LLNL-PRES-810817

Maestro’s core principles center around reproducibility

§ Self-documentation

— Should be documented and easy to document.

§ Consistency

— Should be run the same way every time it’s run.

§ Repeatability

— Should be easy to repeat.

§ Reproducibility

— All the above are pre-requisites. — Different than repeatability. — Requires more extensive metadata capture.

Documentation Consistency Repeatability Reproducibility

slide-10
SLIDE 10

10

LLNL-PRES-810817

Maestro studies allow users to break workflows down into composable pieces

description: name: simple_workflow description: A simple workflow. study:

  • name: run-sim

description: Submit the simulation. run: cmd: /usr/gapps/code input.in –def res $(RES)

  • name: post-process

description: Post process simulation run: cmd: python process.py –p $(run-sim.workspace) depends: [run-sim] global.parameters: RES: value: [2, 4, 6] label: RES.%%

Workflow Overview

  • Name
  • Description
  • Other metadata

Study Steps specify

  • What gets run
  • The order in which things are run
  • Used to define multistep workflows

Parameter/sample space

slide-11
SLIDE 11

11

LLNL-PRES-810817

§ The benefit to having this modular design is that the various components can be

swapped out to deliver various benefits.

— Different specifications could be supported — Different backends utilizing varying technologies can be seamlessly used

Maestro is split between the frontend command line utility and the backend Conductor daemon

Parse the specification and construct the Study Global workspace is constructed, and initial state saved Load initial state, and expand the Execution graph (DAG) Maestro (frontend) Conductor (backend) Monitor and update study state until termination

slide-12
SLIDE 12

12

LLNL-PRES-810817

Maestro is split between the frontend command line utility and the backend Conductor daemon

File System Compute Cluster Login Node Maestro Background Conductor Scheduler (SLURM, LSF, …)

slide-13
SLIDE 13

13

LLNL-PRES-810817

§ A strong focus on user centered design and development

— Meet requirements in as lightweight, transparent, and general a manner as possible — Negotiate requirements to provide features that encourage ease of use and best practices — Provides as much flexibility as possible leaving workflow decisions to the user

§ Development of a community that shares a common workflow vocabulary and

collaborates around central core of best practices

— The study specification provides a consistent, step oriented, workflow structure for discussion

§ An emphasis on flexibility, maintainability, and expandability

— Enable users to utilize technologies, but not couple users to them — Use sound software system design and architecture to promote sustainability — Enable the creation of a community driven ecosystem

Maestro’s Software Engineering Strategy and Vision

slide-14
SLIDE 14

Where is Maestro being used?

slide-15
SLIDE 15

15

LLNL-PRES-810817

Maestro is being used to compare nuclear data measurements to compiled libraries

§ Compared data in “Baghdad Atlas” to data libraries — Gamma-rays produced in neutron-inelastic reactions — Data libraries include ENDL and ENDF used in applications § Maestro used to run ~70 Mercury simulations with GNDS (ENDL 2009.3)

data and post-process results to get gamma intensity

§ Next: Add plotting call to Maestro and test additional data evaluations

such as ENDFB-VIII

§ IRT-5000 reactor “decommissioned” in

Operation Desert Storm

§ IAEA shared databook with LBNL, LLNL § LBNL created online electronic database Al-Tuwaitha Nuclear Research Facility, Iraq

  • 200
  • 100

100 200 300 400 10 20 30 40 50 60 70 80 90

Difference, % Element Z

GNDS issue Difference

slide-16
SLIDE 16

16

LLNL-PRES-810817

Study of fragment impacts on explosives is using Maestro to sweep across parameters

§ High Explosive Response to Mechanical Stimulus (HERMES) model used to examine response of high

explosive (HE) materials to mechanical insults

— Package in ALE3D — Maestro with pgen used to sample fragment size and speed for different geometries § Next steps: automate post-processing and job submission with Maestro to define “go/no go” boundary

Δt, μs

Time between impact and detonation

“no go” “go”

Shock to detonation (SDT) Deflagration to detonation (DDT) Deflagration 2 cm radius steel sphere, 3400 fps, at t = 6 μs

Example

HE

Steel plate Barrier

slide-17
SLIDE 17

17

LLNL-PRES-810817

Maestro is being used to train a decision-making loop for finding antibodies to SARS-CoV-2 (COVID-19)

§ Agents are spun up and

alternate between decision making and executing calculations

§ The individual studies place

their structure and results into the history

§ Decision makers choose

new mutations to run calculations

Agent 2 Agent 1 History

Decide Decide … … … FoldX FoldX FoldX FoldX Decide Decide FoldX FoldX FoldX FoldX Decide FoldX FoldX FoldX FoldX FoldX FoldX FoldX FoldX Time … …

slide-18
SLIDE 18

18

LLNL-PRES-810817

§ Generation of perturbed simulations of a shaped-charge jet and creating synthetic

radiographs to feed a deep learning model along with scalar data from the simulations

— Train the model to link images back to input parameters (surrogate modeling)

§ Pipelining of cardiac simulations and testing of the hyperparameters for an ML model

that generates non-invasive cardiac images based on EKG input data

— Led to a patent on the model for generating images

§ The ATOM Modeling Pipeline (AMPL) has used Maestro to predict the safety and

pharmacokinetic properties of over 26 million drug-like compounds (GS-CAD)

— When mixed with binding affinity calculations, can be used to recommend experimental drugs in the

battle against COVID-19

— Dataset released this week: https://covid19drugscreen.llnl.gov/info

Maestro is improving user productivity in a wide variety of ways

slide-19
SLIDE 19

19

LLNL-PRES-810817

§ Maestro GitHub

— https://github.com/LLNL/maestrowf

§ Maestro Issue Tracker

— https://github.com/LLNL/maestrowf/issues

§ Maestro Documentation

— https://lc.llnl.gov/confluence/display/MAESTRO — https://maestrowf.readthedocs.io

§ Mailing List

— maestrowf@llnl.gov

§ Try Maestro

— pip install maestrowf

We are excited to work with the user community in helping to develop and grow their workflows

Get involved!

  • Provide feedback/use cases
  • Submit tickets
  • Become a developer
  • How are you using Maestro?
  • Tell your story J.
  • Hang out and join the discussion!

Maestro encourages a supportive and collaborative community for both Maestro developers and users.

slide-20
SLIDE 20

Disclaimer This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.