Harnessing Harnessing Grid Resources with Grid Resources with - - PowerPoint PPT Presentation

harnessing harnessing grid resources with grid resources
SMART_READER_LITE
LIVE PREVIEW

Harnessing Harnessing Grid Resources with Grid Resources with - - PowerPoint PPT Presentation

Harnessing Harnessing Grid Resources with Grid Resources with Data- -Centric Task Farms Centric Task Farms Data Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Committee Members: Ian Foster:


slide-1
SLIDE 1

Harnessing Harnessing Grid Resources with Grid Resources with Data Data-

  • Centric Task Farms

Centric Task Farms

Ioan Raicu

Distributed Systems Laboratory Computer Science Department University of Chicago Committee Members: Ian Foster: University of Chicago, Argonne National Laboratory Rick Stevens: University of Chicago, Argonne National Laboratory Alex Szalay: The Johns Hopkins University Candidacy Exam December 12th, 2007

slide-2
SLIDE 2

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 2

Outline

  • 1. Motivation and Challenges
  • 2. Hypothesis & Proposed Solution
  • Abstract Model
  • Practical Realization
  • 3. Related Work
  • 4. Completed Milestones
  • 5. Work in Progress
  • 6. Conclusion & Contributions
slide-3
SLIDE 3

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 3

Outline

  • 1. Motivation and Challenges
  • 2. Hypothesis & Proposed Solution
  • Abstract Model
  • Practical Realization
  • 3. Related Work
  • 4. Completed Milestones
  • 5. Work in Progress
  • 6. Conclusion & Contributions
slide-4
SLIDE 4

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 4

Motivating Example: AstroPortal Stacking Service

  • Purpose

– On-demand “stacks” of random locations within ~10TB dataset

  • Challenge

– Rapid access to 10-10K “random” files – Time-varying load

  • Solution

– Dynamic acquisition of compute, storage S4

Sloan Data

+ + + + + + = +

Web page

  • r Web

Service

slide-5
SLIDE 5

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 5

Challenge #1: Long Queue Times

12/20/2007 5

  • Wait queue times are typically longer than

the job duration times

SDSC DataStar 1024 Processor Cluster 2004

slide-6
SLIDE 6

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 6

Challenge #2: Slow Job Dispatch Rates

  • Production LRMs ~1 job/sec dispatch rates

12/20/2007 6

  • What job durations are

needed for 90% efficiency:

– Production LRMs: 900 sec – Development LRMs: 100 sec – Experimental LRMs: 50 sec – 1~10 sec should be possible

System Comments Throughput (tasks/sec) Condor (v6.7.2) - Production Dual Xeon 2.4GHz, 4GB 0.49 PBS (v2.1.8) - Production Dual Xeon 2.4GHz, 4GB 0.45 Condor (v6.7.2) - Production Quad Xeon 3 GHz, 4GB 2 Condor (v6.8.2) - Production 0.42 Condor (v6.9.3) - Development 11 Condor-J2 - Experimental Quad Xeon 3 GHz, 4GB 22

Medium Size Grid Site (1K processors)

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0.001 0.01 0.1 1 10 100 1000 10000 100000 Task Length (sec) Efficiency 1 task/sec (i.e. PBS, Condor 6.8) 10 tasks/sec (i.e. Condor 6.9.2) 100 tasks/sec 500 tasks/sec (i.e. Falkon) 1K tasks/sec 10K tasks/sec 100K tasks/sec 1M tasks/sec

slide-7
SLIDE 7

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 7 12/20/2007 7

Challenge #3: Poor Scalability of Shared File Systems

100 1000 10000 100000 1000000 1 10 100 1000 Number of Nodes Throughput (Mb/s)

GPFS R LOCAL R GPFS R+W LOCAL R+W

  • GPFS vs. LOCAL

– Read Throughput

  • 1 node: 0.48Gb/s vs. 1.03Gb/s 2.15x
  • 160 nodes: 3.4Gb/s vs. 165Gb/s 48x

– Read+Write Throughput:

  • 1 node: 0.2Gb/s vs. 0.39Gb/s 1.95x
  • 160 nodes: 1.1Gb/s vs. 62Gb/s 55x

– Metadata (mkdir / rm -rf)

  • 1 node: 151/sec vs. 199/sec 1.3x
  • 160 nodes: 21/sec vs. 31840/sec 1516x
slide-8
SLIDE 8

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 8

Outline

  • 1. Motivation and Challenges
  • 2. Hypothesis & Proposed Solution
  • Abstract Model
  • Practical Realization
  • 3. Related Work
  • 4. Completed Milestones
  • 5. Work in Progress
  • 6. Conclusion & Contributions
slide-9
SLIDE 9

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 9

Hypothesis

  • Important concepts related to the hypothesis

– Workload: a complex query (or set of queries) decomposable into simpler tasks to answer broader analysis questions – Data locality is crucial to the efficient use of large scale distributed systems for scientific and data-intensive applications – Allocate computational and caching storage resources, co-scheduled to

  • ptimize workload performance

“Significant performance improvements can be

  • btained in the analysis of large dataset by leveraging

information about data analysis workloads rather than individual data analysis tasks.”

slide-10
SLIDE 10

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 10

Proposed Solution: Part 1 Abstract Model and Validation

  • AMDASK:

– An Abstract Model for DAta-centric taSK farms

  • Task Farm: A common parallel pattern that drives independent computational

tasks

– Models the efficiency of data analysis workloads for the split/merge class

  • f applications

– Captures the following data diffusion properties

  • Resources are acquired in response to demand
  • Data and applications diffuse from archival storage to new resources
  • Resource “caching” allows faster responses to subsequent requests
  • Resources are released when demand drops
  • Considers both data and computations to optimize performance
  • Model Validation

– Implement the abstract model in a discrete event simulation – Validate model with statistical methods (R2 Statistic, Residual Analysis)

slide-11
SLIDE 11

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 11

Proposed Solution: Part 2 Practical Realization

  • Falkon: a Fast and Light-weight tasK executiON

framework

– Light-weight task dispatch mechanism – Dynamic resource provisioning to acquire and release resources – Data management capabilities including data-aware scheduling – Integration into Swift to leverage many Swift-based applications

  • Applications cover many domains: astronomy, astro-physics,

medicine, chemistry, and economics

slide-12
SLIDE 12

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 12

Outline

  • 1. Motivation and Challenges
  • 2. Hypothesis & Proposed Solution
  • Abstract Model
  • Practical Realization
  • 3. Related Work
  • 4. Completed Milestones
  • 5. Work in Progress
  • 6. Conclusion & Contributions
slide-13
SLIDE 13

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 13

AMDASK: Base Definitions

  • Data Stores: Persistent & Transient

– Store capacity, load, ideal bandwidth, available bandwidth

  • Data Objects:

– Data object size, data object’s storage location(s), copy time

  • Transient resources: compute speed,

resource state

  • Task: application, input/output data
slide-14
SLIDE 14

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 14

AMDASK: Execution Model Concepts

  • Dispatch Policy

– next-available, first-available, max-compute-util, max-cache-hit

  • Caching Policy

– random, FIFO, LRU, LFU

  • Replay policy
  • Data Fetch Policy

– Just-in-Time, Spatial Locality

  • Resource Acquisition Policy

– one-at-a-time, additive, exponential, all-at-once, optimal

  • Resource Release Policy

– distributed, centralized

slide-15
SLIDE 15

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 15

AMDASK: Performance Efficiency Model

  • B: Average Task Execution Time:

– K: Stream of tasks – µ(k): Task k execution time

  • V: Workload Execution Time:

– A: Arrival rate of tasks – T: Transient Resources

  • W: Workload Execution Time with Overheads

Κ ∈

Κ = Β

k

) ( | | 1 κ µ

| | * 1 , | | max Κ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ Α Τ = B V | | * 1 , | | max Κ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ Α Τ Υ = W

  • Y: Average Task Execution Time with Overheads:

– ο(k): Dispatch overhead – ς(δ,τ): Time to get data

⎪ ⎪ ⎩ ⎪ ⎪ ⎨ ⎧ Ω ∈ ∉ + + Κ Ω ∈ ∈ + Κ =

∑ ∑

Κ ∈ Κ ∈

δ τ φ δ τ δ ζ κ κ µ δ τ φ δ κ κ µ

κ κ

), ( , )] , ( ) ( ) ( [ | | 1 ), ( )], ( ) ( [ | | 1

  • Y
slide-16
SLIDE 16

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 16

AMDASK: Performance Efficiency Model

  • Efficiency
  • Speedup
  • Optimizing Efficiency

– Easy to maximize either efficiency or speedup independently – Harder to maximize both at the same time

  • Find the smallest number of transient resources |T| while maximizing

speedup*efficiency

W V = Ε

⎪ ⎪ ⎩ ⎪ ⎪ ⎨ ⎧ > ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ Α Τ ≤ = A T Y Y Y B A T Y E 1 | | , * | | , max 1 | | , 1

| | * T E S =

slide-17
SLIDE 17

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 17

Performance Efficiency Model Example: 1K CPU Cluster

  • Application: Angle - distributed data mining
  • Testbed Characteristics:

– Computational Resources: 1024 – Transient Resource Bandwidth: 10MB/sec – Persistent Store Bandwidth: 426MB/sec

  • Workload:

– Number of Tasks: 128K – Arrival rate: 1000/sec – Average task execution time: 60 sec – Data Object Size: 40MB

slide-18
SLIDE 18

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 18

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 1 2 4 8 16 32 64 128 256 512 1024 Number of Processors Efficiency 1 10 100 1000 Speedup Efficiency Speedup Speedup*Efficiency

Performance Efficiency Model Example: 1K CPU Cluster

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 1 2 4 8 16 32 64 128 256 512 1024 Number of Processors Efficiency 1 10 100 1000 Speedup Efficiency Speedup Speedup*Efficiency

Falkon on ANL/UC TG Site: Peak Dispatch Throughput: 500/sec Scalability: 50~500 CPUs Peak speedup: 623x PBS on ANL/UC TG Site: Peak Dispatch Throughput: 1/sec Scalability: <50 CPUs Peak speedup: 54x

slide-19
SLIDE 19

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 19

Performance Efficiency Model Example: 128K CPU BG/P

  • Application: Angle - distributed data mining
  • Testbed Characteristics:

– Computational Resources: 128K – Transient Resource Bandwidth: 10MB/sec – Persistent Store Bandwidth: 10000MB/sec

  • Workload:

– Number of Tasks: 128K – Arrival rate: 10000/sec – Average task execution time: 60 sec – Data Object Size: 40MB

slide-20
SLIDE 20

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 20

Performance Efficiency Model Example: 128K CPU BG/P

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 Number of Processors Efficiency 1 10 100 1000 10000 100000 Speedup

Efficiency Speedup Speedup*Efficiency

Falkon on BG/P: Peak Dispatch Throughput per I/O Node: 40/s Scalability: 1000~10000 CPUs Peak speedup: 13153x Condor on BG/P: Peak Dispatch Throughput per I/O Node: 1/s Scalability: ? Peak speedup: 9353x

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 1 2 4 8 1 6 3 2 6 4 1 2 8 2 5 6 5 1 2 1 2 4 2 4 8 4 9 6 8 1 9 2 1 6 3 8 4 3 2 7 6 8 6 5 5 3 6 1 3 1 7 2 Number of Processors Efficiency 1 10 100 1000 10000 100000 Speedup

Efficiency Speedup Speedup*Efficiency

slide-21
SLIDE 21

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 21

Model Validation: Simulations

  • Implement the abstract model in a discrete event

simulation

  • Simulation parameters

– number of storage and computational resources – communication costs – management overhead – workloads (inter-arrival rates, query complexity, data set properties, and data locality)

  • Model Validation

– R2 Statistic – Residual analysis

slide-22
SLIDE 22

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 22

Outline

  • 1. Motivation and Challenges
  • 2. Hypothesis & Proposed Solution
  • Abstract Model
  • Practical Realization
  • 3. Related Work
  • 4. Completed Milestones
  • 5. Work in Progress
  • 6. Conclusion & Contributions
slide-23
SLIDE 23

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 23

Falkon: a Fast and Light-weight tasK executiON framework

  • Goal: enable the rapid and efficient execution of

many independent jobs on large compute clusters

  • Combines three components:

– a streamlined task dispatcher able to achieve order-of- magnitude higher task dispatch rates than conventional schedulers Challenge #1 – resource provisioning through multi-level scheduling techniques Challenge #2 – data diffusion and data-aware scheduling to leverage the co-located computational and storage resources Challenge #3

slide-24
SLIDE 24

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 24

Falkon: The Streamlined Task Dispatcher

  • Tier 1: Dispatcher

– GT4 Web Service accepting task submissions from clients and sending them to available executors

  • Tier 2: Executor

– Run tasks on local resources

  • Provisioner

– Static and dynamic resource provisioning

WS WS Provisioner Compute Resources Executor 1 Clients Executor n Compute Resource m Compute Resource 1 Dispatcher

slide-25
SLIDE 25

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 25

Falkon: The Streamlined Task Dispatcher

  • Falkon Message Exchanges

– Description:

{1}: task(s) submit {2}: task(s) submit confirmation {3}: notification for work {4}: request for task(s) {5 or 7}: dispatch task(s) {6}: deliver task(s) results to service {8}: notification for task result(s) {9}: request for task result(s) {10}: deliver task(s) results to client

– Worst case (process tasks individually, no optimizations):

  • 4 WS messages ({1,2}, {4,5}, {6,7}, {9,10}) and 2 notifications ({3},

{8}) per task

slide-26
SLIDE 26

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 26

Falkon: The Streamlined Task Dispatcher

– Bundling

  • Include multiple tasks per communication

message

– Piggy-Backing

  • Attach next task to acknowledgement of

previous task

  • Include data management information in the

task description and acknowledgement messages

  • Falkon Message Exchanges Enhancements

– Message reduction:

  • General Lower Bound: 102+c, where c is a small positive value
  • Application Specific Lower Bound: 100+c, where c is a small positive value
slide-27
SLIDE 27

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 27

Falkon: Resource Provisioning

  • 0. provisioner registration

1. task(s) submit 2. resource allocation to GRAM 3. resource allocation to LRM 4. executor registration 5. notification for work 6. pick up task(s) 7. deliver task(s) results 8. notification for task(s) result 9. pick up task(s) results

slide-28
SLIDE 28

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 28

Falkon: Data Diffusion

  • Resource acquired in

response to demand

  • Data and applications diffuse

from archival storage to newly acquired resources

  • Resource “caching” allows

faster responses to subsequent requests

– Cache Eviction Strategies: RANDOM, FIFO, LRU, LFU

  • Resources are released

when demand drops

text

Task Dispatcher Data-Aware Scheduler Persistent Storage Shared File System Idle Resources Provisioned Resources

slide-29
SLIDE 29

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 29

Falkon: Data Diffusion

  • Considers both data and computations to
  • ptimize performance
  • Decrease dependency of a shared file system

– Theoretical linear scalability with compute resources – Significantly increases meta-data creation and/or modification performance

  • Completes the “data-centric task farm”

realization

slide-30
SLIDE 30

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 30

Outline

  • 1. Motivation and Challenges
  • 2. Hypothesis & Proposed Solution
  • Abstract Model
  • Practical Realization
  • 3. Related Work
  • 4. Completed Milestones
  • 5. Work in Progress
  • 6. Conclusion & Contributions
slide-31
SLIDE 31

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 31

Related Work: Task Farms

  • [Casanova99]: Adaptive Scheduling for Task Farming with Grid

Middleware

  • [Heymann00]: Adaptive Scheduling for Master-Worker

Applications on the Computational Grid

  • [Danelutto04]: Adaptive Task Farm Implementation Strategies
  • [González-Vélez05]: An Adaptive Skeletal Task Farm for Grids
  • [Petrou05]: Scheduling Speculative Tasks in a Compute Farm
  • [Reid06]: Task farming on Blue Gene

Conclusion: none addressed the proposed “data-centric” part of task farms

slide-32
SLIDE 32

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 32

Related Work: Task Dispatch

  • [Zhou92]: LSF – Load Sharing Cluster Management
  • [Bode00]: PBS – Portable Batch Scheduler and Maui Scheduler
  • [Anderson04]: BOINC – Task Distribution for Volunteer

Computing

  • [Thain05]: Condor
  • [Robinson07]: Condor-J2 – Turning Cluster Management into

Data Management Conclusion: related work is several orders of magnitude slower

slide-33
SLIDE 33

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 33

Related Work: Resource Provisioning

  • [Appleby01]: Oceano - SLA Based Management of a

Computing Utility

  • [Frey02, Mehta06]: Condor glide-ins
  • [Walker06]: MyCluster (based on Condor glide-ins)
  • [Ramakrishnan06]: Grid Hosting with Adaptive Resource

Control

  • [Bresnahan06]: Provisioning of bandwidth
  • [Singh06]: Simulations

Conclusion: Allows dynamic resizing of resource pool (independent of application logic) based on system load and makes use of light-weight task dispatch

slide-34
SLIDE 34

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 34

Related Work: Data Management

  • [Beynon01]: DataCutter
  • [Ranganathan03]: Simulations
  • [Ghemawat03,Dean04,Chang06]: BigTable, GFS, MapReduce
  • [Liu04]: GridDB
  • [Chervenak04,Chervenak06]: RLS (Replica Location Service),

DRS (Data Replication Service)

  • [Tatebe04,Xiaohui05]: GFarm
  • [Branco04,Adams06]: DIAL/ATLAS

Conclusion: Our work focuses on the co-location of storage and computations close to each other (i.e. on the same physical resource) while operating in a dynamic environment.

slide-35
SLIDE 35

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 35

Outline

  • 1. Motivation and Challenges
  • 2. Hypothesis & Proposed Solution
  • Abstract Model
  • Practical Realization
  • 3. Related Work
  • 4. Completed Milestones
  • 5. Work in Progress
  • 6. Conclusion & Contributions
slide-36
SLIDE 36

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 36

Completed Milestones

  • Abstract task farm model [Dissertation Proposal 2007]
  • Practical Realization: Falkon

– Task Dispatcher [Globus Incubator 2007, SC07] – Resource Provisioning [SC07, TG07] – Data Diffusion [NSF06, MSES07] – Swift Integration [SWF07, NOVA08]

  • Applications [NASA06, TG06, SC06, NASA07, SWF07, NOVA08]

– AstroPortal, Montage, fMRI, MolDyn, Econ

slide-37
SLIDE 37

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 37

Completed Milestones: Dispatcher Throughput

100 200 300 400 500 32 64 96 128 160 192 224 256 Number of Executors Throughput (tasks/sec)

WS Calls (no security) Falkon (no security) Falkon (GSISecureConversation)

200 400 600 800 1000 1000 2000 3000 4000 5000 Number of Executors Throughput (tasks/sec)

500 1000 1500 2000 2500 3000 3500 4000 4500 5000 200 400 600 800 1000 1200 1400 Time (sec) Throughput (tasks/sec) 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000 2000000 Tasks Completed

Throughput (raw/sec) Throughput (Average/60) Tasks Completed

500 1000 1500 2000 2500 3000 3500 4000 4500 5000 60 120 180 240 300 360 420 Time (sec) Throughput (tasks/sec) 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 Tasks Completed

Throughput (raw/sec) Throughput (average/60) Tasks Completed

slide-38
SLIDE 38

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 38

Completed Milestones: Dispatcher Performance Profiling

1 2 3 4 5 6 7 8 9 10 11 12 13 2 4 8 16 32 64 128 200 # of Executors CPU Time per Task (ms) 100 200 300 400 500 600 Throughput (tasks/sec)

Task Submit (Client -> Service) Notification for Task Availability (Service -> Executor) Task Dispatch (Service -> Executor) Task Results (Executor -> Service) Notifications for Task Results (Service -> Client) WS communication Throughput (tasks/sec)

3 6 9 12 15 18 21 24 27 30 2 4 8 16 32 64 128 200 # of Executors CPU Time per Task (ms) 30 60 90 120 150 180 210 240 270 300 Throughput (tasks/sec)

Task Submit (Client -> Service) Notification for Task Availability (Service -> Executor) Task Dispatch (Service -> Executor) Task Results (Executor -> Service) Notifications for Task Results (Service -> Client) WS communication Throughput (tasks/sec)

  • GT: Java WS-Core 4.0.4
  • Java: Sun JDK 1.6
  • Machine Hardware: Dual Xeon 3GHz CPUs with HT
  • Machine OS: Linux 2.6.13-15.16-smp
  • Executors Location: ANL/UC TG Site, 100 dual

Xeon 2.4GHz CPU nodes, ~2ms latency

  • Workload: 10000 tasks, “/bin/sleep 0”

1 2 3 4 5 6 7 1 2 4 8 16 32 64 128 Bundle Size CPU Time per Task (ms) 500 1000 1500 2000 2500 3000 3500 4000

Throughput (tasks/sec)

Task Submit (Client -> Service) Notification for Task Availability (Service -> Executor) Task Dispatch (Service -> Executor) Task Results (Executor -> Service) Notifications for Task Results (Service -> Client) WS communication Throughput (tasks/sec)

slide-39
SLIDE 39

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 39

Completed Milestones: Resource Provisioning

  • End-to-end execution time:

– 1260 sec in ideal case – 4904 sec 1276 sec

  • Average task queue time:

– 42.2 sec in ideal case – 611 sec 43.5 sec

  • Trade-off:

– Resource Utilization for Execution Efficiency

GRAM +PBS Falkon-15 Falkon-60 Falkon-120 Falkon-180 Falkon-∞ Ideal (32 nodes) Time to complete (sec) 4904 1754 1680 1507 1484 1276 1260 Resouce Utilization 30% 89% 75% 65% 59% 44% 100% Execution Efficiency 26% 72% 75% 84% 85% 99% 100% Resource Allocations 1000 11 9 7 6

1 2 4 8 16 32 64 1 640 160 3 20 18 16 8 4 2 1 5 10 15 20 25 30 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Stage Number Number of Machines 100 200 300 400 500 600 700 Number of Tasks # of Machines # of Tasks

  • 18 Stages
  • 1,000 tasks
  • 17,820 CPU seconds
  • 1,260 total time on 32 machines

5 10 15 20 25 30 35 580.386 1156.853 1735.62 Time (sec) # of Executors

Allocated Registered Active

Ideal

5 10 15 20 25 30 35 494.438 986.091 1477.3 Time (sec) # of Executors

Allocated Registered Active

Falkon-180 Falkon-15

GRAM +PBS Falkon-15 Falkon-60 Falkon-120 Falkon-180 Falkon-∞ Ideal (32 nodes) Queue Time (sec) 611.1 87.3 83.9 74.7 44.4 43.5 42.2 Execution Time (sec) 56.5 17.9 17.9 17.9 17.9 17.9 17.8 Execution Time % 8.5% 17.0% 17.6% 19.3% 28.7% 29.2% 29.7%

slide-40
SLIDE 40

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 40

Completed Milestones: Data Diffusion

100 1,000 10,000 100,000 1 2 4 8 16 32 64 Number of Nodes Read+Write Throughput (Mb/s)

  • 1. Model (local disk)
  • 2. Model (shared file system)
  • 3. Falkon (next-available policy)
  • 4. Falkon (next-available policy) + Wrapper
  • 5. Falkon (first-available policy – 0% locality)
  • 6. Falkon (first-available policy – 100% locality)
  • 7. Falkon (max-compute-util policy – 0% locality)
  • 8. Falkon (max-compute-util policy – 100% locality)

100 1,000 10,000 100,000 1 2 4 8 16 32 64 Number of Nodes Read Throughput (Mb/s)

  • 1. Model (local disk)
  • 2. Model (shared file system)
  • 3. Falkon (next-available policy)
  • 4. Falkon (next-available policy) + Wrapper
  • 5. Falkon (first-available policy – 0% locality)
  • 6. Falkon (first-available policy – 100% locality)
  • 7. Falkon (max-compute-util policy – 0% locality)
  • 8. Falkon (max-compute-util policy – 100% locality)
  • No Locality

– Modest loss of read performance for small # of nodes (<8) – Comparable performance with large # of nodes – Modest gains in read+write performance

  • Locality

– Significant gains in performance beyond 8 nodes – Data-aware scheduler achieves near optimal performance and scalability

slide-41
SLIDE 41

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 41

Completed Milestones: Falkon Integration with Swift

13

Virtual Node(s) SwiftScript Abstract computation Virtual Data Catalog SwiftScript Compiler

Specification Execution

Virtual Node(s)

Provenance data Provenance data Provenance collector launcher launcher file1 file2 file3 App F1 App F2

Scheduling

Execution Engine (Karajan w/ Swift Runtime)

Swift runtime callouts C C C C

Status reporting

Swift Architecture

Provisioning

Falkon Resource Provisioner Amazon EC2

Application #Tasks/workflow #Stages ATLAS: High Energy Physics Event Simulation 500K 1 fMRI DBIC: AIRSN Image Processing 100s 12 FOAM: Ocean/Atmosphere Model 2000 3 GADU: Genomics 40K 4 HNL: fMRI Aphasia Study 500 4 NVO/NASA: Photorealistic Montage/Morphology 1000s 16 QuarkNet/I2U2: Physics Science Education 10s 3 ~ 6 RadCAD: Radiology Classifier Training 1000s 5 SIDGrid: EEG Wavelet Processing, Gaze Analysis 100s 20 SDSS: Coadd, Cluster Search 40K, 500K 2, 8 SDSS: Stacking, AstroPortal 10Ks ~ 100Ks 2 ~ 4 MolDyn: Molecular Dynamics 1Ks ~ 20Ks 8

slide-42
SLIDE 42

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 42 Falkon: a Fast and Light-weight tasK executiON framework

Completed Milestones: fMRI Application

1239 2510 3683 4808 456 866 992 1123 120 327 546 678 1000 2000 3000 4000 5000 6000 120 240 360 480

Input Data Size (Volumes) Time (s)

GRAM GRAM/Clustering Falkon

  • GRAM vs. Falkon: 85%~90% lower run time
  • GRAM/Clustering vs. Falkon: 40%~74% lower run time
slide-43
SLIDE 43

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 43 Falkon: a Fast and Light-weight tasK executiON framework

Completed Milestones: Montage Application

500 1000 1500 2000 2500 3000 3500 m P r

  • j

e c t m D i f f / F i t m B a c k g r

  • u

n d m A d d ( s u b ) m A d d t

  • t

a l Components Time (s) GRAM/Clustering MPI Falkon

  • GRAM/Clustering vs. Falkon: 57% lower application run time
  • MPI* vs. Falkon: 4% higher application run time
  • * MPI should be lower bound
slide-44
SLIDE 44

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 44 44

1800 3600 5400 7200 9000 10800 12600 14400 1 1001 2001 3001 4001 5001 6001 7001 8001 9001 10001 11001 12001 13001 14001 15001 16001 17001 18001 19001 20001 Task ID Time (sec) waitQueueTime execTime resultsQueueTime

  • 244 molecules 20497 jobs
  • 15091 seconds on 216 CPUs 867.1 CPU hours
  • Efficiency: 99.8%
  • Speedup: 206.9x 8.2x faster than GRAM/PBS
  • 50 molecules w/ GRAM (4201 jobs) 25.3 speedup

Completed Milestones: MolDyn Application

slide-45
SLIDE 45

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 45

Completed Milestones: AstroPortal Application with Data Diffusion

50 100 150 200 250 300 350 400 450 Web Server Shared File System (WAN) Shared File System (LAN) Data Diffusion Original Dataset Location Time (sec) ZIPF - alpha 0.1 ZIPF - alpha 1.0 ZIPF - alpha 2.0 UNIFORM

20 40 60 80 100 120 140

Data Diffusion (UNIFORM) Data Diffusion (ZIPF 0.1) Data Diffusion (ZIPF 1.0) Data Diffusion (ZIPF 2.0) Shared File System (LAN - UNIFORM) Shared File System (LAN - ZIPF 0.1) Shared File System (LAN - ZIPF 1.0) Shared File System (LAN - ZIPF 2.0)

Time (sec)

1st Time 2nd Time

  • No data locality results in

10% lower performance

  • Data Locality offers

significant (up to 300%) performance improvement with data diffusion

Stacking Size (# of

  • bjects)

Object Distribution Working Set Size (#

  • f files)

Working Set Size (#

  • f objects)

Locality (accesses per file) 10000 ZIPF - alpha 0.1 1915 1924 5.22 10000 ZIPF - alpha 1.0 2755 2819 3.63 10000 ZIPF - alpha 2.0 6110 6259 1.64 10000 UNIFORM 9771 10000 1.02

slide-46
SLIDE 46

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 46

Outline

  • 1. Motivation and Challenges
  • 2. Hypothesis & Proposed Solution
  • Abstract Model
  • Practical Realization
  • 3. Related Work
  • 4. Completed Milestones
  • 5. Work in Progress
  • 6. Conclusion & Contributions
slide-47
SLIDE 47

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 47

Work in Progress

  • Model Validation via Simulations
  • Practical Realization

– Task Dispatcher – Resource Provisioning – Data Diffusion – Performance Evaluation

  • Applications
slide-48
SLIDE 48

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 48

Work in Progress: Simulations

Can data-centric task farms offer good scalability and efficiency?

  • Implement the abstract model in a discrete

event simulation

– GridSim simulator from GridBus project

  • Model Validation

– R2 Statistic – Residual analysis

slide-49
SLIDE 49

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 49

Future Work: 3-Tier Architecture

  • Overview

– Tier 1: Forwarder – Tier 2: Dispatcher – Tier 3: Executor

  • Ensures that Falkon works with

local access policies (firewalls, private IP spaces, etc)

  • Increases performance and

scalability

– IBM BlueGene/P with 128K CPU cores – large multi-site Grids such as OSG and TeraGrid

slide-50
SLIDE 50

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 50

Work in Progress: Provisioning

  • Extend provisioner to support:

– Virtual Workspace Service (opens door to EC2) – Cobalt LRM on the BG/P – Multiple sites and make use of wait queue prediction mechanisms

  • Establish a permanent Planetlab testbed
slide-51
SLIDE 51

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 51

Work in Progress: Data Diffusion

  • Update Swift to use data diffusion
  • Cache eviction policies
  • Data-aware schedulers
  • Hybrid vs. distributed data management
  • Data cache migration on resource de-allocation
  • Explore data pre-fetching policies
slide-52
SLIDE 52

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 52

Work in Progress: Performance Evaluation Metrics

  • Dispatch (Task throughput, Scalability, Resource

efficiency)

  • Provisioning (Queue wait time, Dynamic resource

provisioning latency, Resource wastage)

  • Data Management (Data caching: cache hits vs.

cache misses, Scheduling overheads, Data management overheads)

  • Applications (Execution time, Speedups)
  • Performance Profiling (Quantifying communication
  • verhead)
slide-53
SLIDE 53

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 53

Work in Progress: Applications & New Science

  • Applications (*via Swift)

– Astronomy: AstroPortal, Montage* – Medicine: fMRI* – Chemistry: MolDyn* – Economics: Econ*

  • New Science

– AstroPortal (stacking service) used to find faint

  • bjects in SDSS DR5 dataset
slide-54
SLIDE 54

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 54

Outline

  • 1. Motivation and Challenges
  • 2. Hypothesis & Proposed Solution
  • Abstract Model
  • Practical Realization
  • 3. Related Work
  • 4. Completed Milestones
  • 5. Work in Progress
  • 6. Conclusion & Contributions
slide-55
SLIDE 55

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 55

Conclusions & Contributions

  • Define an abstract model for performance efficiency of

data analysis workloads using data-centric task farms

  • Provide a reference implementation (Falkon)

– Use a streamlined dispatcher to increase task throughput by several orders of magnitude over traditional LRMs – Use multi-level scheduling to reduce perceived wait queue time for tasks to execute on remote resources – Address data diffusion through co-scheduling of storage and computational resources to improve performance and scalability – Provide the benefits of dedicated hardware without the associated high cost – Show flexibility/effectiveness on real world applications

  • astronomy, chemistry, medicine, and economics
slide-56
SLIDE 56

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 56

More Information

  • Related Projects:

– Falkon: http://dev.globus.org/wiki/Incubator/Falkon – AstroPortal: http://people.cs.uchicago.edu/~iraicu/research/AstroPortal/index.htm – Swift: http://www.ci.uchicago.edu/swift/index.php

  • Collaborators (relevant to this proposal):

– Ian Foster, The University of Chicago & Argonne National Laboratory – Alex Szalay, The Johns Hopkins University – Rick Stevens, The University of Chicago & Argonne National Laboratory – Yong Zhao, Microsoft – Mike Wilde, Computation Institute, University of Chicago & Argonne National Laboratory – Catalin Dumitrescu, The University of Chicago – Zhao Zhang, The University of Chicago

  • Funding:

– NASA: Ames Research Center, Graduate Student Research Program (GSRP) – DOE: Mathematical, Information, and Computational Sciences Division subprogram of the Office of Advanced Scientific Computing Research, Office of Science, U.S. Dept. of Energy – NSF: TeraGrid

slide-57
SLIDE 57

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 57

Publications

1.

  • Y. Zhao, I. Raicu, M. Hategan, M. Wilde, I. Foster. “Swift: Realizing Fast, Reliable, Large Scale Scientific

Computation”, under review at Journal of Future Generation Computer Systems. 2.

  • Y. Zhao, I. Raicu, I. Foster, M. Hategan, V. Nefedova, M. Wilde. “Realizing Fast, Scalable and Reliable

Scientific Computations in Grid Environments”, to appear as a book chapter in Grid Computing Research Progress, Nova Publisher 2008. 3.

  • I. Raicu, Y. Zhao, I. Foster, A. Szalay. “A Data Diffusion Approach to Large Scale Scientific Exploration,”

Microsoft eScience Workshop at RENCI 2007. 4.

  • I. Raicu, Y. Zhao, C. Dumitrescu, I. Foster, M. Wilde. “Falkon: a Fast and Light-weight tasK executiON

framework”, IEEE/ACM International Conference for High Performance Computing, Networking, Storage, and Analysis (SC07), 2007. 5.

  • Y. Zhao, M. Hategan, B. Clifford, I. Foster, G. von Laszewski, I. Raicu, T. Stef-Praun, M. Wilde. “Swift: Fast,

Reliable, Loosely Coupled Parallel Computation”, IEEE Workshop on Scientific Workflows 2007. 6.

  • I. Raicu, C. Dumitrescu, I. Foster. “Dynamic Resource Provisioning in Grid Environments”, TeraGrid

Conference 2007. 7.

  • I. Raicu, I. Foster. “Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy

Datasets: Year 1 Status and Year 2 Proposal”, NASA GSRP Year 1 Progress Report and Year 2 Proposal, Ames Research Center, NASA, February 2007. 8.

  • I. Raicu, I. Foster. “Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy

Datasets”, NASA GSRP Proposal, Ames Research Center, NASA, February 2006. 9.

  • I. Raicu, I. Foster, A. Szalay. “Harnessing Grid Resources to Enable the Dynamic Analysis of Large

Astronomy Datasets”, IEEE/ACM International Conference for High Performance Computing, Networking, Storage, and Analysis (SC06), 2006.

  • 10. I. Raicu, I. Foster, A. Szalay, G. Turcu. “AstroPortal: A Science Gateway for Large-scale Astronomy Data

Analysis”, TeraGrid Conference 2006.

  • 11. A. Szalay, J. Bunn, J. Gray, I. Foster, I. Raicu. “The Importance of Data Locality in Distributed Computing

Applications”, NSF Workflow Workshop 2006.

slide-58
SLIDE 58

12/20/2007 Harnessing Grid Resources with Data-Centric Task Farms 58

Reports

1.

  • I. Raicu, I. Foster, Z. Zhang. “Enabling Serial Job Execution on the BlueGene Supercomputer with Falkon,”

work in progress, http://www.ci.uchicago.edu/wiki/bin/view/VDS/DslCS/Falkon_BG, 2007. 2.

  • I. Raicu, C. Dumitrescu, I. Foster. “Provisioning EC2 Resources,” work in progress,

http://www.ci.uchicago.edu/wiki/bin/view/VDS/DslCS/Falkon_EC2, 2007. 3.

  • I. Raicu, Y. Zhao, C. Dumitrescu, I. Foster and M. Wilde. “Falkon: A Proposal for Project Globus Incubation”,

Globus Incubation Management Project, 2007, http://people.cs.uchicago.edu/~iraicu/research/reports/up/Falkon-GlobusIncubatorProposal_v3.pdf. 4.

  • I. Raicu, Y. Zhao, I. Foster, A. Szalay. “Accelerating Large Scale Scientific Exploration through Data

Diffusion,” Technical Report, University of Chicago, 2007, http://people.cs.uchicago.edu/~iraicu/research/reports/up/falkon_data-diffusion_v04.pdf. 5.

  • I. Raicu, Y. Zhao, C. Dumitrescu, I. Foster, M. Wilde. “Dynamic Resource Provisioning in Grid Environments”,

Technical Report, University of Chicago, 2007, http://people.cs.uchicago.edu/~iraicu/research/reports/up/DRP_v01.pdf. 6.

  • I. Raicu, I. Foster, A. Szalay. “3DcacheGrid: Dynamic Distributed Data Cache Grid Engine”, Technical

Report, University of Chicago, 2006, http://people.cs.uchicago.edu/~iraicu/research/reports/up/HPC_SC_2006_v09.pdf. 7.

  • I. Raicu, I. Foster. “Storage and Compute Resource Management via DYRE, 3DcacheGrid, and

CompuStore”, Technical Report, University of Chicago, 2006, http://people.cs.uchicago.edu/~iraicu/research/reports/up/Storage_Compute_RM_Performance_06.pdf. 8.

  • I. Raicu, I. Foster. “SkyServer Web Service”, Technical Report, University of Chicago, 2006,

http://people.cs.uchicago.edu/~iraicu/research/reports/up/SkyServerWS_06.pdf. 9.

  • I. Raicu, , I. Foster. “Characterizing the SDSS DR4 Dataset and the SkyServer Workloads,” Technical Report,

University of Chicago, 2006, http://people.cs.uchicago.edu/~iraicu/research/reports/up/SkyServer_characterization_2006.pdf

  • 10. I. Raicu, , I. Foster. “Characterizing Storage Resources Performance in Accessing the SDSS Dataset,”

Technical Report, University of Chicago, 2005, http://people.cs.uchicago.edu/~iraicu/research/reports/up/astro_portal_report_v1.5.pdf.