STORK: Making Data Placement a First Class Citizen in the Grid - - PowerPoint PPT Presentation

stork making data placement a first class citizen in the
SMART_READER_LITE
LIVE PREVIEW

STORK: Making Data Placement a First Class Citizen in the Grid - - PowerPoint PPT Presentation

STORK: Making Data Placement a First Class Citizen in the Grid Tevf ik Kosar Universit y of Wisconsin-Madison May 25 th , 2004 CERN Need to move data around. . TB TB PB PB Stork: Making Data Placement a First Class Citizen in the Grid


slide-1
SLIDE 1

STORK: Making Data Placement a First Class Citizen in the Grid

Tevf ik Kosar Universit y of Wisconsin-Madison May 25th, 2004 CERN

slide-2
SLIDE 2

Stork: Making Data Placement a First Class Citizen in the Grid

Need to move data around. .

TB TB PB PB

slide-3
SLIDE 3

Stork: Making Data Placement a First Class Citizen in the Grid

While doing this. .

Locat e t he dat a Access het er ogeneous r esour ces Face wit h all kinds of f ailur es Allocat e and de-allocat e st orage Move t he dat a Clean-up ever yt hing

All of these need to be done reliably and ef f iciently!

slide-4
SLIDE 4

Stork: Making Data Placement a First Class Citizen in the Grid

Stork

A scheduler f or dat a placement act ivit ies in t he Grid What Condor is f or comput at ional j obs, St or k is f or dat a placement St or k comes wit h a new concept :

“Make dat a placement a f ir st class cit izen in t he Grid.”

slide-5
SLIDE 5

Stork: Making Data Placement a First Class Citizen in the Grid

Outline

I nt r oduct ion The Concept St or k Feat ur es Big Pict ure Case St udies Conclusions

slide-6
SLIDE 6

Stork: Making Data Placement a First Class Citizen in the Grid

The Concept

  • Stage-in
  • Execute the Job
  • Stage-out

Individual Jobs

slide-7
SLIDE 7

Stork: Making Data Placement a First Class Citizen in the Grid

The Concept

  • Stage-in
  • Execute the Job
  • Stage-out

Stage-in Execute the job Stage-out Release input space Release output space Allocate space for input & output data

Individual Jobs

slide-8
SLIDE 8

Stork: Making Data Placement a First Class Citizen in the Grid

The Concept

  • Stage-in
  • Execute the Job
  • Stage-out

Stage-in Execute the job Stage-out Release input space Release output space Allocate space for input & output data

Data Placement Jobs Computational Jobs

slide-9
SLIDE 9

Stork: Making Data Placement a First Class Citizen in the Grid

DAGMan

The Concept

Condor Job Queue

DaP A A.submit DaP B B.submit Job C C.submit ….. Parent A child B Parent B child C Parent C child D, E …..

C Stork Job Queue E

DAG specification

A C B D E F

slide-10
SLIDE 10

Stork: Making Data Placement a First Class Citizen in the Grid

Why Stork?

St or k under st ands t he char act er ist ics and semant ics of dat a placement j obs. Can make smar t scheduling decisions, f or r eliable and ef f icient dat a placement .

slide-11
SLIDE 11

Stork: Making Data Placement a First Class Citizen in the Grid

Understanding Job Characteristics & Semantics

J ob_t ype = t r ansf er , r eser ve, r elease? Sour ce and dest inat ion host s, f iles, pr ot ocols t o use?

hDet er mine concur r ency level hCan select alt ernat e prot ocols hCan select alt ernat e rout es hCan t une net wor k par amet er s (t cp buf f er size,

I / O block size, # of par allel st r eams)

h…

slide-12
SLIDE 12

Stork: Making Data Placement a First Class Citizen in the Grid

Support f or Heterogeneity

Prot ocol t ranslat ion using St or k memory buf f er.

slide-13
SLIDE 13

Stork: Making Data Placement a First Class Citizen in the Grid

Support f or Heterogeneity

Prot ocol t ranslat ion using St or k Disk Cache.

slide-14
SLIDE 14

Stork: Making Data Placement a First Class Citizen in the Grid

Flexible Job Representation and Multilevel Policy Support

[ Type = “Tr ansf er ”; Src_Url = “srb:/ / ghidor ac.sdsc.edu/ kosar t .condor / x.dat ”; Dest_Url = “nest :/ / t ur key.cs.wisc.edu/ kosar t / x.dat ”; … … … … Max_Retry = 10; Restart_in = “2 hour s”; ]

slide-15
SLIDE 15

Stork: Making Data Placement a First Class Citizen in the Grid

Failure Recovery and Ef f icient Resource Utilization

Fault t oler ance

hJ ust submit a bunch of dat a placement j obs,

and t hen go away..

Cont r ol number of concur r ent t r ansf er s f r om/ t o any st or age syst em

hPr event s over loading

Space allocat ion and De-allocat ions

hMake sur e space is available

slide-16
SLIDE 16

Stork: Making Data Placement a First Class Citizen in the Grid

Run- time Adaptation

Dynamic pr ot ocol select ion

[ dap_t ype = “t ransf er”; src_url = “drout er:/ / slic04.sdsc.edu/ t mp/ t est .dat ”; dest_url = “drout er:/ / quest 2.ncsa.uiuc.edu/ t mp/ t est .dat ”; alt _prot ocols = “nest-nest , gsif t p-gsif t p”; ] [ dap_t ype = “t ransf er”; src_url = “any:/ / slic04.sdsc.edu/ t mp/ t est .dat ”; dest_url = “any:/ / quest 2.ncsa.uiuc.edu/ t mp/ t est .dat ”; ]

slide-17
SLIDE 17

Stork: Making Data Placement a First Class Citizen in the Grid

Run- time Adaptation

Run-t ime Prot ocol Aut o-t uning

[ link = “slic04.sdsc.edu – quest 2.ncsa.uiuc.edu”; pr ot ocol = “gsif t p”; bs = 1024KB; / / block size t cp_bs = 1024KB; / / TCP buf f er size p = 4; ]

slide-18
SLIDE 18

Stork: Making Data Placement a First Class Citizen in the Grid

Outline

I nt r oduct ion The Concept St or k Feat ur es Big Pict ure Case St udies Conclusions

slide-19
SLIDE 19

JOB DESCRI PTI ONS USER PLANNER Abstract DAG

slide-20
SLIDE 20

JOB DESCRI PTI ONS PLANNER USER WORKFLOW MANAGER Abstract DAG Concrete DAG RLS

slide-21
SLIDE 21

JOB DESCRI PTI ONS PLANNER DATA PLACEMENT SCHEDULER COMPUTATI ON SCHEDULER USER STORAGE SYSTEMS WORKFLOW MANAGER Abstract DAG Concrete DAG RLS COMPUTE NODES

slide-22
SLIDE 22

JOB DESCRI PTI ONS PLANNER DATA PLACEMENT SCHEDULER COMPUTATI ON SCHEDULER USER STORAGE SYSTEMS WORKFLOW MANAGER Abstract DAG Concrete DAG RLS COMPUTE NODES

  • D. JOB

LOG FI LES

  • C. JOB

LOG FI LES POLI CY ENFORCER

slide-23
SLIDE 23

JOB DESCRI PTI ONS PLANNER DATA PLACEMENT SCHEDULER COMPUTATI ON SCHEDULER USER STORAGE SYSTEMS WORKFLOW MANAGER Abstract DAG Concrete DAG RLS COMPUTE NODES

  • D. JOB

LOG FI LES

  • C. JOB

LOG FI LES POLI CY ENFORCER DATA MI NER NETWORK MONI TORI NG TOOLS FEEDBACK MECHANI SM

slide-24
SLIDE 24

JOB DESCRI PTI ONS PEGASUS STORK CONDOR/ CONDOR-G USER STORAGE SYSTEMS DAGMAN Abstract DAG Concrete DAG RLS COMPUTE NODES

  • D. JOB

LOG FI LES

  • C. JOB

LOG FI LES MATCHMAKER DATA MI NER NETWORK MONI TORI NG TOOLS FEEDBACK MECHANI SM

slide-25
SLIDE 25

Stork: Making Data Placement a First Class Citizen in the Grid

Outline

I nt r oduct ion The Concept St or k Feat ur es Big Pict ure Case St udies Conclusions

slide-26
SLIDE 26

Stork: Making Data Placement a First Class Citizen in the Grid

Case Study I : SRB- UniTree Data Pipeline

Tr ansf er ~3 TB

  • f DPOSS dat a

f r om SRB @SDSC t o UniTr ee @NCSA A dat a t r ansf er pipeline cr eat ed wit h St or k

SRB Server UniTree Server SDSC Cache NCSA Cache Submit Site

slide-27
SLIDE 27

Stork: Making Data Placement a First Class Citizen in the Grid

UniTree not responding Diskrouter reconfigured and restarted SDSC cache reboot & UW CS Network outage Software problem

Failure Recovery

slide-28
SLIDE 28

Stork: Making Data Placement a First Class Citizen in the Grid

Case Study - I I

slide-29
SLIDE 29

Stork: Making Data Placement a First Class Citizen in the Grid

Dynamic Protocol Selection

slide-30
SLIDE 30

Stork: Making Data Placement a First Class Citizen in the Grid

Runtime Adaptation

Bef or e Tuning:

  • parallelism = 1
  • block_size = 1 MB
  • t cp_bs = 64 KB

Af t er Tuning:

  • parallelism = 4
  • block_size = 1 MB
  • t cp_bs = 256 KB
slide-31
SLIDE 31

Stork: Making Data Placement a First Class Citizen in the Grid Condor File Transfer Mechanism Merge files Split files DiskRouter/ Globus-url-copy SRB put SRB Server @SDSC Management Site @UW

User submit s a DAG at management sit e

WCER Staging Site @UW Condor Pool @UW DiskRouter/ Globus-url-copy Other Condor Pools Control flow Input Data flow Output Data flow Processing Other Replicas 1 4 3 7 8 5 6 2

Case Study - I I I

slide-32
SLIDE 32

Stork: Making Data Placement a First Class Citizen in the Grid

Conclusions

Regar d dat a placement as individual j obs. Tr eat comput at ional and dat a placement j obs dif f er ent ly. I nt r oduce a specialized scheduler f or dat a placement . Provide end-t o-end aut omat ion, f ault t olerance, run-t ime adapt at ion, mult ilevel policy suppor t , r eliable and ef f icient t r ansf er s.

slide-33
SLIDE 33

Stork: Making Data Placement a First Class Citizen in the Grid

Future work

Enhanced int er act ion bet ween St or k and higher level planner s

hbet t er coor dinat ion of CPU and I / O

I nt er act ion bet ween mult iple St or k ser ver s and j ob delegat ion Enhanced aut hent icat ion mechanisms More run-t ime adapt at ion

slide-34
SLIDE 34

Stork: Making Data Placement a First Class Citizen in the Grid

Related Publications

Tevf ik Kosar and Miron Livny. “St ork: Making Dat a P lacement a First Class Cit izen in t he Grid”. I n P roceedings of 24t h I EEE I nt . Conf erence

  • n Dist ribut ed Comput ing Syst ems (I CDCS 2004), Tokyo, J apan, March

2004. George Kola, Tevf ik Kosar and Miron Livny. “A Fully Aut omat ed Fault- t olerant Syst em f or Dist ribut ed Video P rocessing and Of f -sit e Replicat ion. To appear in P roceedings of 14t h ACM I nt . Workshop on et work and Operat ing Syst ems Support f or Digit al Audio and Video (Nossdav 2004), Kinsale, I reland, J une 2004. Tevf ik Kosar, George Kola and Miron Livny. “A Framework f or Self -

  • pt imizing, Fault-t olerant , High P

erf ormance Bulk Dat a Transf ers in a Het erogeneous Grid Environment ”. I n P roceedings of 2nd I nt . Symposium on P arallel and Dist ribut ed Comput ing (I SP DC 2003), Lj ublj ana, Slovenia, Oct ober 2003. George Kola, Tevf ik Kosar and Miron Livny. “Run-t ime Adapt at ion of Grid Dat a Placement J obs”. I n P roceedings of I nt . Workshop on Adapt ive Grid Middleware (AGridM 2003), New Orleans, LA, Sept ember 2003.

slide-35
SLIDE 35

Stork: Making Data Placement a First Class Citizen in the Grid

You don’t have to FedEx your data anymore. . Stork delivers it f or you!

For mor e inf or mat ion:

  • Email: kosar t @cs.wisc.edu
  • ht t p:/ / www.cs.wisc.edu/ condor / st or k