SCR and Preparing for Burst Buffers DOE COE Performance Portability - - PowerPoint PPT Presentation

scr and preparing for burst buffers
SMART_READER_LITE
LIVE PREVIEW

SCR and Preparing for Burst Buffers DOE COE Performance Portability - - PowerPoint PPT Presentation

SCR and Preparing for Burst Buffers DOE COE Performance Portability Meeting August 23, 2017 Elsa Gonsiorowski LLNL-PRES-737156 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National


slide-1
SLIDE 1

SCR and Preparing for Burst Buffers

DOE COE Performance Portability Meeting August 23, 2017 Elsa Gonsiorowski

LLNL-PRES-737156

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

slide-2
SLIDE 2

Outline

Burst Buffer Technologies SCR Overview Burst Buffers and SCR Additional Software Projects

LLNL-PRES-737156 2

slide-3
SLIDE 3

Burst Buffer Technologies

Type Technology Location Node Local IBM BBAPI LLNL (Sierra) Machine Global Cray Datawarp LANL (Trinity)

LLNL-PRES-737156 3

slide-4
SLIDE 4

Burst Buffer Technologies

Type Technology Location Node Local IBM BBAPI LLNL (Sierra) Machine Global Cray Datawarp LANL (Trinity) How can an application utilize this layer for I/O workloads?

LLNL-PRES-737156 3

slide-5
SLIDE 5

Burst Buffers Use Case

Relies on integration with resource scheduler Different for machine-global vs. node-local storage Does not address inter-job data movement

LLNL-PRES-737156 4

slide-6
SLIDE 6

Burst Buffers Use Case

Perfect for Checkpoint/Restart

LLNL-PRES-737156 5

slide-7
SLIDE 7

Checkpoint Restart

a.k.a. Defensive I/O

LLNL-PRES-737156 6

slide-8
SLIDE 8

Checkpoint Restart

a.k.a. Defensive I/O Related to the size of system memory

LLNL-PRES-737156 6

slide-9
SLIDE 9

Checkpoint Restart

a.k.a. Defensive I/O Related to the size of system memory Depends on resiliency of machine

LLNL-PRES-737156 6

slide-10
SLIDE 10

Checkpoint Restart

a.k.a. Defensive I/O Related to the size of system memory Depends on resiliency of machine

Which may change over time

LLNL-PRES-737156 6

slide-11
SLIDE 11

Checkpoint Restart

a.k.a. Defensive I/O Related to the size of system memory Depends on resiliency of machine

Which may change over time

Creating a checkpoint may not be as efficient as recomputing

LLNL-PRES-737156 6

slide-12
SLIDE 12

SCR Goal

Enable checkpointing applications to take advantage of system storage hierarchies

LLNL-PRES-737156 7

slide-13
SLIDE 13

SCR Goal

Enable checkpointing applications to take advantage of system storage hierarchies Efficient file movement between storage layers Data redundancy operations

LLNL-PRES-737156 7

slide-14
SLIDE 14

SCR Components

LLNL-PRES-737156 8

slide-15
SLIDE 15

SCR Component: Backend Library

Redirect application files Synchronous & asynchronous flush operations

Hardware specific capabilities

Data redundancy Support for both checkpoint & output data

LLNL-PRES-737156 9

slide-16
SLIDE 16

SCR Component: Backend Library

int rc = MyApp_Checkpoint(path);

LLNL-PRES-737156 10

slide-17
SLIDE 17

SCR Component: Backend Library

SCR_Route_file(path, newpath); int rc = MyApp_Checkpoint(newpath);

LLNL-PRES-737156 10

slide-18
SLIDE 18

SCR Component: Backend Library

SCR_Start_output("dataset name", flags); SCR_Route_file(path, newpath); int rc = MyApp_Checkpoint(newpath); SCR_Complete_output(rc);

LLNL-PRES-737156 10

slide-19
SLIDE 19

SCR Component: Frontend Scripts

On Startup Locate most recent checkpoint and fetch for restart

LLNL-PRES-737156 11

slide-20
SLIDE 20

SCR Component: Frontend Scripts

On Startup Locate most recent checkpoint and fetch for restart Within Allocation Detect application crash or system failures and trigger restart

LLNL-PRES-737156 11

slide-21
SLIDE 21

SCR Component: Frontend Scripts

On Startup Locate most recent checkpoint and fetch for restart Within Allocation Detect application crash or system failures and trigger restart During Execution Manage datasets

LLNL-PRES-737156 11

slide-22
SLIDE 22

SCR Component: Frontend Scripts

On Startup Locate most recent checkpoint and fetch for restart Within Allocation Detect application crash or system failures and trigger restart During Execution Manage datasets Resource Scheduler Integration Pre- and post-stage data movement

LLNL-PRES-737156 11

slide-23
SLIDE 23

SCR Component: Configurations

Define the levels of the hierarchy Define modes/groups of failure Define checkpointing and data residency needs

LLNL-PRES-737156 12

slide-24
SLIDE 24

SCR Component: Configurations

Define the levels of the hierarchy Define modes/groups of failure Define checkpointing and data residency needs Machine Portability

LLNL-PRES-737156 12

slide-25
SLIDE 25

Burst Buffers Use Case

Checkpoint Restart

LLNL-PRES-737156 13

slide-26
SLIDE 26

Burst Buffers & SCR: Prestage

Machine Global Solved

Global access from CNs to storage

Node Local Requires new softwares

Requires deep integration with resource scheduler Most useful for DATs or half+ system jobs

LLNL-PRES-737156 14

slide-27
SLIDE 27

Burst Buffers & SCR: Poststage

Similar solution for both BB types Take advantage of vendor APIs asynchronous operations Decouples burst buffer usage from compute usage

Requires integration with resource scheduler Allows for more fine-grain control of resources

LLNL-PRES-737156 15

slide-28
SLIDE 28

Unaddressed Concerns

Applications without checkpointing Shared Files Arbitrary data movement

Machine-learning use case

LLNL-PRES-737156 16

slide-29
SLIDE 29

VELOC

Combining two codes: FTI and SCR FTI: variable-based checkpointing scheme Will support existing FTI and SCR applications

LLNL-PRES-737156 17

slide-30
SLIDE 30

UnifyCR

User-level file system Shared namespace across distributed burst buffers I/O interception layer

LLNL-PRES-737156 18

slide-31
SLIDE 31

MPI File Utils

Use parallel processes to perform file operations Executed within a job allocation dbcast: broadcast from PFS to node-local storage dcp: multiple file copy in parallel drm: delete files in parallel many more https://github.com/hpc/mpifileutils

LLNL-PRES-737156 19

slide-32
SLIDE 32

SCR Team

https://github.com/llnl/scr Kathryn Mohror Adam Moody Greg Becker Elsa Gonsiorowski

LLNL-PRES-737156 20

slide-33
SLIDE 33