STORK: Making Data Placement a First Class Citizen in the Grid - - PowerPoint PPT Presentation
STORK: Making Data Placement a First Class Citizen in the Grid - - PowerPoint PPT Presentation
STORK: Making Data Placement a First Class Citizen in the Grid Tevf ik Kosar Universit y of Wisconsin-Madison May 25 th , 2004 CERN Need to move data around. . TB TB PB PB Stork: Making Data Placement a First Class Citizen in the Grid
Stork: Making Data Placement a First Class Citizen in the Grid
Need to move data around. .
TB TB PB PB
Stork: Making Data Placement a First Class Citizen in the Grid
While doing this. .
Locat e t he dat a Access het er ogeneous r esour ces Face wit h all kinds of f ailur es Allocat e and de-allocat e st orage Move t he dat a Clean-up ever yt hing
All of these need to be done reliably and ef f iciently!
Stork: Making Data Placement a First Class Citizen in the Grid
Stork
A scheduler f or dat a placement act ivit ies in t he Grid What Condor is f or comput at ional j obs, St or k is f or dat a placement St or k comes wit h a new concept :
“Make dat a placement a f ir st class cit izen in t he Grid.”
Stork: Making Data Placement a First Class Citizen in the Grid
Outline
I nt r oduct ion The Concept St or k Feat ur es Big Pict ure Case St udies Conclusions
Stork: Making Data Placement a First Class Citizen in the Grid
The Concept
- Stage-in
- Execute the Job
- Stage-out
Individual Jobs
Stork: Making Data Placement a First Class Citizen in the Grid
The Concept
- Stage-in
- Execute the Job
- Stage-out
Stage-in Execute the job Stage-out Release input space Release output space Allocate space for input & output data
Individual Jobs
Stork: Making Data Placement a First Class Citizen in the Grid
The Concept
- Stage-in
- Execute the Job
- Stage-out
Stage-in Execute the job Stage-out Release input space Release output space Allocate space for input & output data
Data Placement Jobs Computational Jobs
Stork: Making Data Placement a First Class Citizen in the Grid
DAGMan
The Concept
Condor Job Queue
DaP A A.submit DaP B B.submit Job C C.submit ….. Parent A child B Parent B child C Parent C child D, E …..
C Stork Job Queue E
DAG specification
A C B D E F
Stork: Making Data Placement a First Class Citizen in the Grid
Why Stork?
St or k under st ands t he char act er ist ics and semant ics of dat a placement j obs. Can make smar t scheduling decisions, f or r eliable and ef f icient dat a placement .
Stork: Making Data Placement a First Class Citizen in the Grid
Understanding Job Characteristics & Semantics
J ob_t ype = t r ansf er , r eser ve, r elease? Sour ce and dest inat ion host s, f iles, pr ot ocols t o use?
hDet er mine concur r ency level hCan select alt ernat e prot ocols hCan select alt ernat e rout es hCan t une net wor k par amet er s (t cp buf f er size,
I / O block size, # of par allel st r eams)
h…
Stork: Making Data Placement a First Class Citizen in the Grid
Support f or Heterogeneity
Prot ocol t ranslat ion using St or k memory buf f er.
Stork: Making Data Placement a First Class Citizen in the Grid
Support f or Heterogeneity
Prot ocol t ranslat ion using St or k Disk Cache.
Stork: Making Data Placement a First Class Citizen in the Grid
Flexible Job Representation and Multilevel Policy Support
[ Type = “Tr ansf er ”; Src_Url = “srb:/ / ghidor ac.sdsc.edu/ kosar t .condor / x.dat ”; Dest_Url = “nest :/ / t ur key.cs.wisc.edu/ kosar t / x.dat ”; … … … … Max_Retry = 10; Restart_in = “2 hour s”; ]
Stork: Making Data Placement a First Class Citizen in the Grid
Failure Recovery and Ef f icient Resource Utilization
Fault t oler ance
hJ ust submit a bunch of dat a placement j obs,
and t hen go away..
Cont r ol number of concur r ent t r ansf er s f r om/ t o any st or age syst em
hPr event s over loading
Space allocat ion and De-allocat ions
hMake sur e space is available
Stork: Making Data Placement a First Class Citizen in the Grid
Run- time Adaptation
Dynamic pr ot ocol select ion
[ dap_t ype = “t ransf er”; src_url = “drout er:/ / slic04.sdsc.edu/ t mp/ t est .dat ”; dest_url = “drout er:/ / quest 2.ncsa.uiuc.edu/ t mp/ t est .dat ”; alt _prot ocols = “nest-nest , gsif t p-gsif t p”; ] [ dap_t ype = “t ransf er”; src_url = “any:/ / slic04.sdsc.edu/ t mp/ t est .dat ”; dest_url = “any:/ / quest 2.ncsa.uiuc.edu/ t mp/ t est .dat ”; ]
Stork: Making Data Placement a First Class Citizen in the Grid
Run- time Adaptation
Run-t ime Prot ocol Aut o-t uning
[ link = “slic04.sdsc.edu – quest 2.ncsa.uiuc.edu”; pr ot ocol = “gsif t p”; bs = 1024KB; / / block size t cp_bs = 1024KB; / / TCP buf f er size p = 4; ]
Stork: Making Data Placement a First Class Citizen in the Grid
Outline
I nt r oduct ion The Concept St or k Feat ur es Big Pict ure Case St udies Conclusions
JOB DESCRI PTI ONS USER PLANNER Abstract DAG
JOB DESCRI PTI ONS PLANNER USER WORKFLOW MANAGER Abstract DAG Concrete DAG RLS
JOB DESCRI PTI ONS PLANNER DATA PLACEMENT SCHEDULER COMPUTATI ON SCHEDULER USER STORAGE SYSTEMS WORKFLOW MANAGER Abstract DAG Concrete DAG RLS COMPUTE NODES
JOB DESCRI PTI ONS PLANNER DATA PLACEMENT SCHEDULER COMPUTATI ON SCHEDULER USER STORAGE SYSTEMS WORKFLOW MANAGER Abstract DAG Concrete DAG RLS COMPUTE NODES
- D. JOB
LOG FI LES
- C. JOB
LOG FI LES POLI CY ENFORCER
JOB DESCRI PTI ONS PLANNER DATA PLACEMENT SCHEDULER COMPUTATI ON SCHEDULER USER STORAGE SYSTEMS WORKFLOW MANAGER Abstract DAG Concrete DAG RLS COMPUTE NODES
- D. JOB
LOG FI LES
- C. JOB
LOG FI LES POLI CY ENFORCER DATA MI NER NETWORK MONI TORI NG TOOLS FEEDBACK MECHANI SM
JOB DESCRI PTI ONS PEGASUS STORK CONDOR/ CONDOR-G USER STORAGE SYSTEMS DAGMAN Abstract DAG Concrete DAG RLS COMPUTE NODES
- D. JOB
LOG FI LES
- C. JOB
LOG FI LES MATCHMAKER DATA MI NER NETWORK MONI TORI NG TOOLS FEEDBACK MECHANI SM
Stork: Making Data Placement a First Class Citizen in the Grid
Outline
I nt r oduct ion The Concept St or k Feat ur es Big Pict ure Case St udies Conclusions
Stork: Making Data Placement a First Class Citizen in the Grid
Case Study I : SRB- UniTree Data Pipeline
Tr ansf er ~3 TB
- f DPOSS dat a
f r om SRB @SDSC t o UniTr ee @NCSA A dat a t r ansf er pipeline cr eat ed wit h St or k
SRB Server UniTree Server SDSC Cache NCSA Cache Submit Site
Stork: Making Data Placement a First Class Citizen in the Grid
UniTree not responding Diskrouter reconfigured and restarted SDSC cache reboot & UW CS Network outage Software problem
Failure Recovery
Stork: Making Data Placement a First Class Citizen in the Grid
Case Study - I I
Stork: Making Data Placement a First Class Citizen in the Grid
Dynamic Protocol Selection
Stork: Making Data Placement a First Class Citizen in the Grid
Runtime Adaptation
Bef or e Tuning:
- parallelism = 1
- block_size = 1 MB
- t cp_bs = 64 KB
Af t er Tuning:
- parallelism = 4
- block_size = 1 MB
- t cp_bs = 256 KB
Stork: Making Data Placement a First Class Citizen in the Grid Condor File Transfer Mechanism Merge files Split files DiskRouter/ Globus-url-copy SRB put SRB Server @SDSC Management Site @UW
User submit s a DAG at management sit e
WCER Staging Site @UW Condor Pool @UW DiskRouter/ Globus-url-copy Other Condor Pools Control flow Input Data flow Output Data flow Processing Other Replicas 1 4 3 7 8 5 6 2
Case Study - I I I
Stork: Making Data Placement a First Class Citizen in the Grid
Conclusions
Regar d dat a placement as individual j obs. Tr eat comput at ional and dat a placement j obs dif f er ent ly. I nt r oduce a specialized scheduler f or dat a placement . Provide end-t o-end aut omat ion, f ault t olerance, run-t ime adapt at ion, mult ilevel policy suppor t , r eliable and ef f icient t r ansf er s.
Stork: Making Data Placement a First Class Citizen in the Grid
Future work
Enhanced int er act ion bet ween St or k and higher level planner s
hbet t er coor dinat ion of CPU and I / O
I nt er act ion bet ween mult iple St or k ser ver s and j ob delegat ion Enhanced aut hent icat ion mechanisms More run-t ime adapt at ion
Stork: Making Data Placement a First Class Citizen in the Grid
Related Publications
Tevf ik Kosar and Miron Livny. “St ork: Making Dat a P lacement a First Class Cit izen in t he Grid”. I n P roceedings of 24t h I EEE I nt . Conf erence
- n Dist ribut ed Comput ing Syst ems (I CDCS 2004), Tokyo, J apan, March
2004. George Kola, Tevf ik Kosar and Miron Livny. “A Fully Aut omat ed Fault- t olerant Syst em f or Dist ribut ed Video P rocessing and Of f -sit e Replicat ion. To appear in P roceedings of 14t h ACM I nt . Workshop on et work and Operat ing Syst ems Support f or Digit al Audio and Video (Nossdav 2004), Kinsale, I reland, J une 2004. Tevf ik Kosar, George Kola and Miron Livny. “A Framework f or Self -
- pt imizing, Fault-t olerant , High P
erf ormance Bulk Dat a Transf ers in a Het erogeneous Grid Environment ”. I n P roceedings of 2nd I nt . Symposium on P arallel and Dist ribut ed Comput ing (I SP DC 2003), Lj ublj ana, Slovenia, Oct ober 2003. George Kola, Tevf ik Kosar and Miron Livny. “Run-t ime Adapt at ion of Grid Dat a Placement J obs”. I n P roceedings of I nt . Workshop on Adapt ive Grid Middleware (AGridM 2003), New Orleans, LA, Sept ember 2003.
Stork: Making Data Placement a First Class Citizen in the Grid
You don’t have to FedEx your data anymore. . Stork delivers it f or you!
For mor e inf or mat ion:
- Email: kosar t @cs.wisc.edu
- ht t p:/ / www.cs.wisc.edu/ condor / st or k