Parallel scripting with Swift for applications at the petascale and - - PowerPoint PPT Presentation

parallel scripting with swift for applications at the
SMART_READER_LITE
LIVE PREVIEW

Parallel scripting with Swift for applications at the petascale and - - PowerPoint PPT Presentation

Parallel scripting with Swift for applications at the petascale and beyond VecPar PEEPS Workshop Berkeley, CA June 22, 2010 Michael Wilde wilde@mcs.anl.gov Computation Institute, University of Chicago and Argonne National Laboratory


slide-1
SLIDE 1

1

Parallel scripting with Swift for applications at the petascale and beyond

VecPar PEEPS Workshop Berkeley, CA – June 22, 2010 Michael Wilde – wilde@mcs.anl.gov Computation Institute, University of Chicago and Argonne National Laboratory

www.ci.uchicago.edu/swift

slide-2
SLIDE 2
  • Many applications need loosely coupled

scripting

  • Swift harnesses parallel & distributed resources

through a simple scripting language

  • Productivity gains by enabling use of more

powerful systems with less concern for the mechanics

2

Problems addressed by Swift

slide-3
SLIDE 3

Modeling uncertainty for CIM‐EARTH

Parallel AMPL workflow by Joshua Elliott, Meredith Franklin, Todd Munson, Allan Espinosa.

slide-4
SLIDE 4

4

Fast Ocean Atmosphere Model (MPI)

NCAR Manual config, execution, bookkeeping VDS on Teragrid Automated Visualization courtesy Pat Behling and Yun Liu, UW Madison

Work of Veronica Nefedova and Rob Jacob, Argonne

slide-5
SLIDE 5

Problem: Drug screening at APS

5

2M+ ligands

(Mike Kubal, Benoit Roux, and others) (B) O(Millions)

  • f drug

candidates O(tens)

  • f fruitful

candidates for wetlab & APS

slide-6
SLIDE 6

start report

DOCK6 Receptor (1 per protein: defines pocket to bind to)

ZINC 3-D structures ligands complexes

NAB script parameters (defines flexible residues, #MDsteps) Amber Score:

  • 1. AmberizeLigand
  • 3. AmberizeComplex
  • 5. RunNABScript

end

BuildNABScript NAB Script NAB Script Template Amber prep:

  • 2. AmberizeReceptor
  • 4. perl: gen nabscript

FRED Receptor (1 per protein: defines pocket to bind to) Manually prep DOCK6 rec file Manually prep FRED rec file 1 protein (1MB)

6 GB

2M structures (6 GB)

DOCK6 FRED ~4M x 60s x 1 cpu

~60K cpu-hrs

Amber ~10K x 20m x 1 cpu

~3K cpu-hrs

Select best ~500 ~500 x 10hr x 100 cpu

~500K cpu-hrs

GCMC PDB protein descriptions

Select best ~5K Select best ~5K

6

Work of Andrew Binkoaski and Michael Kubal

slide-7
SLIDE 7

Problem: preprocessing and analysis

  • f neuroscience experiments

Many Data Files:

3a.h

align_warp/1

3a.i 3a.s.h

softmean/9

3a.s.i 3a.w

reslice/2

4a.h

align_warp/3

4a.i 4a.s.h 4a.s.i 4a.w

reslice/4

5a.h

align_warp/5

5a.i 5a.s.h 5a.s.i 5a.w

reslice/6

6a.h

align_warp/7

6a.i 6a.s.h 6a.s.i 6a.w

reslice/8

ref.h ref.i atlas.h atlas.i

slicer/10 slicer/12 slicer/14

atlas_x.jpg atlas_x.ppm

convert/11

atlas_y.jpg atlas_y.ppm

convert/13

atlas_z.jpg atlas_z.ppm

convert/15

Many Application Programs:

slide-8
SLIDE 8

Automated image registration for spatial normalization

reorientRun reorientRun reslice_warpRun random_select alignlinearRun resliceRun softmean alignlinear combinewarp strictmean gsmoothRun binarize

reori ent/01 reori ent/02 res l ic e_ w arp /22 al ignli near/ 03 ali gnli near/07 al ignli near/1 1 re
  • rient
/05 reorie nt/06 res l ic e _w a r p /23 reori ent/09 reo rient/1 res li c e_ w arp /24 re
  • rient/
25 reorien t/51 re s li c e_warp/2 6 reorie nt/27 re
  • rient/
52 r e s li c e_warp/2 8 reorie nt/29 re
  • rient/
53 res li c e_warp/ 30 re
  • rient/
31 reorie nt/54 res l ic e _w a rp/32 reorient /33 r e
  • rient
/55 res lic e _warp/34 reorie nt/35 reo r i ent/5 6 res lic e_warp/36 reo r i ent/3 7 reorie nt/57 res lic e_warp/38 res lic e/04 res lic e/08 re s lic e/12 gs m oo th/41 s t r i c tm ean/39 gs mooth/4 2 gs m
  • oth/4
3 gs m
  • oth/4
4 gs m ooth /45 gs mooth/ 46 g s m oo th/47 g s mo
  • th/48
g s m o
  • th/49
gs moot h/50 s
  • ftm e
an/13 ali gnli near/1 7 c
  • m bi
newarp/21 bi nariz e/40

reorient reorient alignlinear reslice softmean alignlinear combine_warp reslice_warp strictmean binarize gsmooth

AIRSN workflow expanded: AIRSN workflow:

slide-9
SLIDE 9

Swift programs

  • A Swift script is a set of functions

– Atomic functions wrap & invoke application programs (on parallel compute nodes) – Composite functions invoke other functions (run in Swift engine)

  • Data is typed as composable arrays and structures of files

and simple scalar types (int, float, string)

  • Collections of persistent file structures are mapped into

this data model as arrays and structures

  • Variables are single assignment
  • Expressions and statements are executed in data‐flow

dependency order and concurrency

  • Members of datasets can be processed in parallel
  • Provenance is gathered as scripts execute

9

slide-10
SLIDE 10

A simple Swift script

To run the Image Magick app “convert”:

1

convert ‐rotate 180 $in $out

2

type imagefile { } // Declare a “file” type.

3 4

app (imagefile output) rotate (imagefile input) {

5

{

6

convert "‐rotate" 180 @input @output ;

7

}

8 9

imagefile image <"m101.2010.0601.jpg">;

10 imagefile newimage <"output.jpg">; 11 12 newimage = rotate(image);

10

slide-11
SLIDE 11

Execution is driven by data flow

1

(int result) myproc (int input)

2

{

3

j = f(input);

4

k = g(input);

5

result = j + k;

6

}

7

j=f() and k=g() are computed in parallel.

8

This parallelism is automatic, based on futures;

9

Works recursively down the scripts’s call graph.

11

slide-12
SLIDE 12

Parallelism via foreach { }

1

type imagefile; // Declare a “file” type.

2 3

app (imagefile output) rotate (imagefile input) {

4

convert "‐rotate" "180" @input @output;

5

}

6 7

imagefile observations[ ] <simple_mapper; prefix=“m101‐raw”>;

8

imagefile flipped[ ] <simple_mapper; prefix=“m101‐flipped”>;

9 10 11 12 foreach obs,i in observations { 13

flipped[i] = rotate(obs);

14 }

12

Name outputs based on index Process all dataset members in parallel Map inputs from local directory

slide-13
SLIDE 13

Many domains process structured datasets

Many Data Files:

3a.h

align_warp/1

3a.i 3a.s.h

softmean/9

3a.s.i 3a.w

reslice/2

4a.h

align_warp/3

4a.i 4a.s.h 4a.s.i 4a.w

reslice/4

5a.h

align_warp/5

5a.i 5a.s.h 5a.s.i 5a.w

reslice/6

6a.h

align_warp/7

6a.i 6a.s.h 6a.s.i 6a.w

reslice/8

ref.h ref.i atlas.h atlas.i

slicer/10 slicer/12 slicer/14

atlas_x.jpg atlas_x.ppm

convert/11

atlas_y.jpg atlas_y.ppm

convert/13

atlas_z.jpg atlas_z.ppm

convert/15

Many Application Programs:

slide-14
SLIDE 14

Swift Data Mapping

type Study { Group g[ ]; } type Group { Subject s[ ]; } type Subject { Volume anat; Run run[ ]; } type Run { Volume v[ ]; } type Volume { Image img; Header hdr; } On-Disk Data Layout Swift’s in-memory data model

Mapping function

  • r script

Mapping function

  • r script
slide-15
SLIDE 15

Swift app function “predict()” Swift app function “predict()” t t

seq seq

dt dt log log PSim application PSim application pg pg

To run: psim –s 1ubq.fas –pdb p \ –temp 100.0 –inc 25.0 >log In Swift code: app (PDB pg, File log) predict (Protein seq, Float temp, Float dt) { psim "-s" @pseq.fasta "-pdb" @pg "–temp" temp ”-inc" dt; } Protein p <ext; exec="Pmap", id="1ubq">; ProtGeo structure; TextFile log; (structure, log) = predict(p, 100., 25.);

Fasta file Fasta file

Application: Protein structure prediction

Encapsulation is the key to transparent distribution, parallelization, and provenance

slide-16
SLIDE 16

foreach sim in [1:1000] { (structure[sim], log[sim]) = predict(p, 100., 25.); } result = analyze(structure) 1000 predict() calls Analyze()

Parallelism via foreach { }

slide-17
SLIDE 17

Application: 3D Protein structure prediction

1.

type Fasta; // Primary protein sequence file in FASTA format

2.

type SecSeq; // Secodary structure file

3.

type RamaMap; // “Ramachandra” mapping info files

4.

type RamaIndex;

5.

type ProtGeo; // PDB‐format file – protein geometry: 3D atom coords

6.

type SimLog;

7. 8.

type Protein { // Input file struct to protein simulator

9.

Fasta fasta; // sequence to predict structure of

10.

SecSeq secseq; // Initial secondary structure to use

11.

ProtGeo native; // 3D structure from experimental data when known

12.

RamaMap map;

13.

RamaIndex index;

  • 14. }

15.

  • 16. type PSimCf {

// Science configuration parameters to simulator

17.

float st;

18.

float tui;

19.

float coeff;

  • 20. }

21.

  • 22. type ProtSim {

// Output file struct from protein simulator

23.

ProtGeo pgeo;

24.

SimLog log;

  • 25. }

17

slide-18
SLIDE 18

Protein structure prediction

1. app (ProtGeo pgeo) predict (Protein pseq) 2. { 3. PSim @pseq.fasta @pgeo; 4. } 5. 6. (ProtGeo pg[ ]) doRound (Protein p, int n) { 7. foreach sim in [0:n‐1] { 8. pg[sim] = predict(p); 9. }

  • 10. }

11.

  • 12. Protein p <ext; exec="Pmap", id="1af7">;
  • 13. ProtGeo structure[ ];
  • 14. int nsim = 10000;
  • 15. structure = doRound(p, nsim);

18

slide-19
SLIDE 19

Protein structure prediction

1 (ProtSim psim[ ]) doRoundCf (Protein p, int n, PSimCf cf) { 2 foreach sim in [0:n‐1] { 3 psim[sim] = predictCf(p, cf.st, cf.tui, cf.coeff ); 4 } 5 } 6 (boolean converged) analyze( ProtSim prediction[ ], int r, int numRounds) 7 { 8 if( r == (numRounds‐1) ) { 9 converged = true; 10 } 11 else { 12 converged = test_convergence(prediction); 13 } 14 }

19

slide-20
SLIDE 20

Protein structure prediction

1. ItFix( Protein p, int nsim, int maxr, float temp, float dt) 2. { 3. ProtSim prediction[ ][ ]; 4. boolean converged[ ]; 5. PSimCf config; 6. 7. config.st = temp; 8. config.tui = dt; 9. config.coeff = 0.1; 10. 11. iterate r { 12. prediction[r] = 13. doRoundCf(p, nsim, config); 14. converged[r] = 15. analyze(prediction[r], r, maxr); 16. } until ( converged[r] );

  • 17. }

20

slide-21
SLIDE 21

Protein structure prediction

1. Sweep( ) 2. { 3. int nSim = 1000; 4. int maxRounds = 3; 5. Protein pSet[ ] <ext; exec="Protein.map">; 6. float startTemp[ ] = [ 100.0, 200.0 ]; 7. float delT[ ] = [ 1.0, 1.5, 2.0, 5.0, 10.0 ]; 8. foreach p, pn in pSet { 9. foreach t in startTemp { 10. foreach d in delT { 11. ItFix(p, nSim, maxRounds, t, d); 12. } 13. } 14. }

  • 15. }

16.

  • 17. Sweep();

21

10 proteins x 1000 simulations x 3 rounds x 2 temps x 5 deltas = 300K tasks

slide-22
SLIDE 22

Submit host (Laptop, login host,…) Workflow status and logs

?????

Compute nodes

f1 f2 f3 a1 a2

Data server

f1 f2 f3 Provenance log

script

App a1 App a2

site list app list

File transport File transport

Clouds Clouds

Using Swift

Swift is a self‐contained application with cluster and grid client code: Dowload, untar, and run

slide-23
SLIDE 23

Small, fast, local memory-based filesystems

Falkon client (load balancing) Shared global filesystem Swift script Falkon services on BG/P IO Processors BG/P Processor sets

Architecture for petascale scripting

slide-24
SLIDE 24

Collective data management is critical for petascale

  • Applies “scatter/gather” concepts at the file

management level

  • Seeks to avoid contention, maximize parallelism

and use petascale interconnects

– Broadcast common files to compute nodes – Place per‐task data on local (RAM) FS – Gather output into larger sets (time/space) – Aggregate small local FS’s into large striped FS

  • Still in research phase: paradigm and architecures

24

slide-25
SLIDE 25

Collective data management

Global FS

IFS Node IFS

. . .

CN LFS CN LFS

. . .

CN LFS CN LFS

. . .

IFS Node IFS

Distributor Collector 5 5 4 4 3 3 2 2 1 1

slide-26
SLIDE 26

Performance: Molecular dynamics on BG/P

26

935,803 DOCK jobs with Falkon on BG/P in 2 hours

slide-27
SLIDE 27

Performance: SEM for fMRI on Constellation

27

418K SEM tasks with Swift/Coasters on Ranger in 41 hours

slide-28
SLIDE 28

Performance: Proteomics on BG/P

28

4,127 PTMap jobs with Swift/Falkon on BG/P in 3 minutes

slide-29
SLIDE 29

Scaling the many‐task model

29

Compute unit Compute unit Client (master) application Compute unit

graph executor Master graph executor

Compute unit

graph executor

Compute unit

graph executor

Compute unit

graph executor

Virtual data store Global persistent storage

Extreme-scale computing complex

Ultra-fast Message queues

slide-30
SLIDE 30

Scaling many‐task computing

  • ADLB: tasks can be lightweight functions

– Retains RPC model of input‐process‐output – Fast, distributed, asynchronous load balancing

  • Multi‐level task manager

– Must scale to massive computing complexes

  • Transparent distributed management of local

storage

– Leverage local filesystems (RAM), aggregate, make access more transparent through DHT methods

30

slide-31
SLIDE 31

Conclusion: Motivation for Swift

  • Enhance scientific productivity

– Location – and paradigm – independence: Same scripts run on workstations, clusters, clouds, grids, and petascale supercomputers – Automation of dataflow, resource selection and error recovery

  • Enable and motivate collaboration

– Community libraries of techniques, protocols, methods – Designed for recording the provenance of all data produced to facilitate scientific processes

slide-32
SLIDE 32
  • Swift is a parallel scripting system for Grids and clusters

– for loosely‐coupled applications ‐ application and utility programs linked by exchanging files

  • Swift is easy to write: simple high‐level C‐like functional language

– Small Swift scripts can do large‐scale work

  • Swift is easy to run: contains all services for running Grid workflow ‐

in one Java application

– Untar and run – acts as a self‐contained Grid client

  • Swift is fast: Karajan provides Swift a powerful, efficient, scalable and

flexible execution engine.

– Scaling close to 1M tasks – .5M in live science work, and growing

  • Swift usage is growing:

– applications in neuroscience, proteomics, molecular dynamics, biochemistry, economics, statistics, and more.

slide-33
SLIDE 33

To learn more and try Swift…

33

  • www.ci.uchicago.edu/swift

– Quick Start Guide:

  • http://www.ci.uchicago.edu/swift/guides/quickstartguide.php

– User Guide:

  • http://www.ci.uchicago.edu/swift/guides/userguide.php

– Introductory Swift Tutorials:

  • http://www.ci.uchicago.edu/swift/docs/index.php

http://www.ci.uchicago.edu/swift

slide-34
SLIDE 34

34

IEEE COMPUTER, Nov 2009

slide-35
SLIDE 35

Acknowledgments

  • Swift effort is supported in part by NSF grants OCI‐721939, OCI‐0944332, and PHY‐

636265, NIH DC08638, and the UChicago/Argonne Computation Institute

  • The Swift team (present and former):

– Ben Clifford, Allan Espinosa, Ian Foster, Mihael Hategan, Ioan Raicu, Sarah Kenny, Mike Wilde, Justin Wozniak, Zhao Zhang, Yong Zhao

  • Java CoG Kit used by Swift developed by:

– Mihael Hategan, Gregor Von Laszewski, and many collaborators

  • Falkon software

– developed by Ioan Raicu and Zhao Zhang

  • ZeptoOS

– Kamil Iskra, Kazutomo Yoshii, and Pete Beckman

  • Scientific application collaborators and users

– U. Chicago Open Protein Simulator Group (Karl Freed, Tobin Sosnick, Glen Hocky, Joe Debartolo, Aashish Adhikari) – U.Chicago Radiology and Human Neuroscience Lab, (Dr. S. Small) – SEE/CIM‐EARTH/Econ: Joshua Elliott, Meredith Franklin, Todd Muson, Tib Stef‐Praun – PTMap: Yingming Zhao, Yue Chen

35