A Grid Research Toolbox The Failure Trace Archive DGSim A. - - PowerPoint PPT Presentation

a grid research toolbox
SMART_READER_LITE
LIVE PREVIEW

A Grid Research Toolbox The Failure Trace Archive DGSim A. - - PowerPoint PPT Presentation

and Cloud A Grid Research Toolbox The Failure Trace Archive DGSim A. Iosup, O. Sonmez, N. Yigitbasi, M. Jan H. Mohamed, S. Anoep, D.H.J. Epema LRI/INRIA Futurs Paris, INRIA PDS Group, ST/EWI, TU Delft H. Li, L. Wolters I. Raicu, C.


slide-1
SLIDE 1

May 10, 2011

1

A Grid Research Toolbox

Paris, France

  • A. Iosup, O. Sonmez, N. Yigitbasi,
  • H. Mohamed, S. Anoep, D.H.J. Epema

PDS Group, ST/EWI, TU Delft

  • M. Jan

LRI/INRIA Futurs Paris, INRIA

DGSim

  • H. Li, L. Wolters

LIACS, U. Leiden

  • I. Raicu, C. Dumitrescu, I. Foster
  • U. Chicago

and Cloud

The Failure Trace Archive

slide-2
SLIDE 2

May 10, 2011

2

A Layered View of the Grid World

  • Layer 1: Hardware + OS
  • Automated
  • Non-grid (XtreemOS?)
  • Layers 2-4: Grid Middleware Stack
  • Low Level: file transfers,

local resource allocation, etc.

  • High Level: grid scheduling
  • Very High Level: application

environments (e.g., distributed

  • bjects)
  • Automated/user control
  • Simple to complex
  • Layer 5: Grid Applications
  • User control
  • Simple to complex

HW + OS Grid Low Level MW Grid High Level MW Grid Very High Level MW Grid Applications Grid MW Stack

slide-3
SLIDE 3

May 10, 2011

3

Grid Work: Science or Engineering?

  • Work on Grid Middleware and Applications
  • When is work in grid computing science?
  • Studying systems to uncover their hidden laws
  • Designing innovative systems
  • Proposing novel algorithms
  • Methodological aspects:

repeatable experiments to verify and extend hypotheses

  • When is work in grid computing engineering?
  • Showing that the system works in a common case, or in a

special case of great importance (e.g., weather prediction)

  • When our students can do it (H. Casanova’s argument)
slide-4
SLIDE 4

May 10, 2011

4

Grid Research Problem: We Are Missing Both Data and Tools

  • Lack of data
  • Infrastructure
  • number and type of resources, resource availability and failures
  • Workloads
  • arrival process, resource consumption
  • Lack of tools
  • Simulators
  • SimGrid, GridSim, MicroGrid, GangSim, OptorGrid, MONARC, …
  • Testing tools that operate in real environments
  • DiPerF, QUAKE/FAIL-FCI
slide-5
SLIDE 5

May 10, 2011

5

Anecdote: Grids are far from being reliable job execution environments

  • 99.999% reliable

Small Cluster

  • 5x decrease in failure rate

after first year [Schroeder and Gibson,

DSN‘06]

Production Cluster

  • >10% jobs fail [Iosup et al., CCGrid’06]

DAS-2

  • 99.99999% reliable

Server

  • 20-45% failures [Khalili et al., Grid’06]

TeraGrid

  • 27% failures, 5-10 retries [Dumitrescu et

al., GCC’05]

Grid3

Source: dboard-gr.cern.ch, May’07.

slide-6
SLIDE 6

May 10, 2011

6

The Anecdote at Scale

  • NMI Build-and-Test Environment at U.Wisc.-Madison:

112 hosts, >40 platforms (e.g., X86-32/Solaris/5, X86-64/RH/9)

  • Serves >50 grid middleware packages: Condor,

Globus, VDT, gLite, GridFTP, RLS, NWS, INCA(-2), APST, NINF-G, BOINC …

  • A. Iosup, D.H.J.Epema, P. Couvares, A. Karp, M. Livny, Build
  • and-Test Workloads for Grid Middleware: Problem, Analysis,

and Applications, CCGrid, 2007.

slide-7
SLIDE 7

May 10, 2011

7

A Grid Research Toolbox

  • Hypothesis: (a) is better than (b).

DGSim

1 2 3 For scenario 1, …

slide-8
SLIDE 8

May 10, 2011

8

Research Questions

slide-9
SLIDE 9

May 10, 2011

9

Outline

  • 1. Introduction and Motivation
  • 2. Q1: Exchange Data

1. The Grid Workloads Archive 2. The Failure Trace Archive 3. The Cloud Workloads Archive (?)

  • 3. Q2: System Characteristics

1. Grid Workloads 2. Grid Infrastructure

  • 4. Q3: System Testing and Evaluation
slide-10
SLIDE 10

Traces in Distributed Systems Research

  • “My system/method/algorithm is better than yours

(on my carefully crafted workload)”

  • Unrealistic (trivial): Prove that “prioritize jobs from

users whose name starts with A” is a good scheduling policy

  • Realistic? “85% jobs are short”; “10% Writes”; ...
  • Major problem in Computer Systems research
  • Workload Trace = recording of real activity from a (real)

system, often as a sequence of jobs / requests submitted by users for execution

  • Main use: compare and cross-validate new job and resource

management techniques and algorithms

  • Major problem: real workload traces from several sources

August 26, 2010

10

slide-11
SLIDE 11

May 10, 2011

11

2.1. The Grid Workloads Archive [1/3]

Content

6 traces

  • nline

1.5 yrs >750K >250

  • A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu, L. Wolters,
  • D. Epema, The Grid Workloads Archive, FGCS 24, 672—686, 2008.
slide-12
SLIDE 12

May 10, 2011

12

2.1. The Grid Workloads Archive [2/3]

Approach: Standard Data Format (GWF)

  • Goals
  • Provide a unitary format for Grid workloads;
  • Same format in plain text and relational DB (SQLite/SQL92);
  • To ease adoption, base on the Parallel Workloads Format (SWF).
  • Existing
  • Identification data: Job/User/Group/Application ID
  • Time and Status: Sub/Start/Finish Time, Job Status and Exit code
  • Request vs. consumption: CPU/Wallclock/Mem
  • Added
  • Job submission site
  • Job structure: bag-of-tasks, workflows
  • Extensions: co-allocation, reservations, others possible
  • A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu, L. Wolters,
  • D. Epema, The Grid Workloads Archive, FGCS 24, 672—686, 2008.
slide-13
SLIDE 13

May 10, 2011

13

2.1. The Grid Workloads Archive [3/3]

Approach: GWF Example

Submit Wait[s] Run #CPUs Mem [KB] Used Req #CPUs

  • A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu, L. Wolters,
  • D. Epema, The Grid Workloads Archive, FGCS 24, 672—686, 2008.
slide-14
SLIDE 14

May 10, 2011

14

2.2. The Failure Trace Archive

Presentation

Types of systems

  • (Desktop) Grids
  • DNS servers
  • HPC Clusters
  • P2P systems

Stats

  • 25 traces
  • 100,000 nodes
  • Decades of
  • peration

The Failure Trace Archive

slide-15
SLIDE 15

May 10, 2011

15

2.2. The Cloud Workloads Archive [1/2]

One Format Fits Them All

  • Flat format
  • Job and Tasks
  • Summary (20 unique data fields) and Detail (60 fields)
  • Categories of information
  • Shared with GWA, PWA: Time, Disk, Memory, Net
  • Jobs/Tasks that change resource consumption profile
  • MapReduce-specific (two-thirds data fields)

15

  • A. Iosup, R. Griffith, A. Konwinski, M. Zaharia, A. Ghodsi, I.

Stoica, Data Format for the Cloud Workloads Archive, v.3, 13/07/10

CWJ CWJD CWT CWTD

slide-16
SLIDE 16

May 10, 2011

16

2.2. The Cloud Workloads Archive [2/2]

The Cloud Workloads Archive

  • Looking for invariants
  • Wr [%] ~40% Total IO, but absolute values vary
  • # Tasks/Job, ratio M:(M+R) Tasks, vary
  • Understanding workload evolution

Trace ID Total IO [MB] Rd. [MB] Wr [%]

HDFS Wr[MB]

CWA-01 10,934 6,805 38% 1,538 CWA-02 75,546 47,539 37% 8,563

slide-17
SLIDE 17

May 10, 2011

17

Outline

  • 1. Introduction and Motivation
  • 2. Q1: Exchange Data

1. The Grid Workloads Archive 2. The Failure Trace Archive 3. The Cloud Workloads Archive (?)

  • 3. Q2: System Characteristics

1. Grid Workloads 2. Grid Infrastructure

  • 4. Q3: System Testing and Evaluation
slide-18
SLIDE 18

May 10, 2011

18

3.1. Grid Workloads [1/7]

Analysis Summary: Grid workloads different, e.g., from parallel production envs. (HPC)

  • Traces: LCG, Grid3, TeraGrid, and DAS
  • long traces (6+ months), active environments (500+K jobs per trace, 100s
  • f users), >4 million jobs
  • Analysis
  • System-wide, VO, group, user characteristics
  • Environment, user evolution
  • System performance
  • Selected findings
  • Almost no parallel jobs
  • Top 2-5 groups/users dominate the workloads
  • Performance problems: high job wait time, high failure rates
  • A. Iosup, C. Dumitrescu, D.H.J. Epema, H. Li, L. Wolters, How

are Real Grids Used? The Analysis of Four Grid Traces and Its Implications, Grid 2006.

slide-19
SLIDE 19

May 10, 2011

19

3.1. Grid Workloads [2/7]

Analysis Summary: Grids vs. Parallel Production Systems

  • Similar CPUTime/Year, 5x larger arrival bursts

Grids Parallel Production Environments (Large clusters, supercomputers) LCG cluster daily peak: 22.5k jobs

  • A. Iosup, D.H.J. Epema, C. Franke, A. Papaspyrou, L. Schley,
  • B. Song, R. Yahyapour, On Grid Performance Evaluation using

Synthetic Workloads, JSSPP’06.

slide-20
SLIDE 20

May 10, 2011

20

Bags-of-Tasks (BoTs)

3.1. Grid Workloads [3/7]

More Analysis: Special Workload Components

BoT = set of jobs… …that start at most Δs after the first job Time [units] Parameter Sweep App. = BoT with same binary

Workflows (WFs)

WF = set of jobs with precedence (think Direct Acyclic Graph)

slide-21
SLIDE 21

May 10, 2011

21

  • Selected Findings
  • Batches predominant in grid workloads; up to 96% CPUTime
  • Average batch size (Δ≤120s) is 15-30 (500 max)
  • 75% of the batches are sized 20 jobs or less

3.1. Grid Workloads [4/7]

BoTs are predominant in grids

  • A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, The

Characteristics and Performance of Groups of Jobs in Grids, Euro-Par, LNCS, vol.4641, pp. 382-393, 2007.

Grid’5000 NorduGrid GLOW (Condor) Submissions 26k 50k 13k Jobs 808k (951k) 738k (781k) 205k (216k) CPU time 193y (651y) 2192y (2443y) 53y (55y)

slide-22
SLIDE 22

May 10, 2011

22

  • Traces
  • Selected Findings
  • Loose coupling
  • Graph with 3-4 levels
  • Average WF size is 30/44 jobs
  • 75%+ WFs are sized 40 jobs or less, 95% are sized 200 jobs or

less

3.1. Grid Workloads [5/7]

Workflows exist, but they seem small

  • S. Ostermann, A. Iosup, R. Prodan, D.H.J. Epema, and T.
  • Fahringer. On the Characteristics of Grid Workflows,

CoreGRID Integrated Research in Grid Computing (CGIW), 2008.

slide-23
SLIDE 23

May 10, 2011

23

  • Adapted to grids: percentage parallel jobs, other values.
  • Validated with 4 grid and 7 parallel production env. traces

3.1. Grid Workloads [6/7]

Modeling Grid Workloads: Feitelson adapted

  • A. Iosup, D.H.J. Epema, T. Tannenbaum, M. Farrellee, and M.
  • Livny. Inter-Operating Grids Through Delegated MatchMaking,

ACM/IEEE Conference on High Performance Networking and Computing (SC), pp. 13-21, 2007.

slide-24
SLIDE 24

May 10, 2011

24

  • Single arrival process for both BoTs and parallel jobs
  • Reduce over-fitting and complexity of “Feitelson adapted”

by removing the RunTime-Parallelism correlated model

  • Validated with 7 grid workloads

3.1. Grid Workloads [7/7]

Modeling Grid Workloads: adding users, BoTs

  • A. Iosup, O. Sonmez, S. Anoep, and D.H.J. Epema. The

Performance of Bags-of-Tasks in Large-Scale Distributed Systems, HPDC, pp. 97-108, 2008.

slide-25
SLIDE 25

May 10, 2011

25

3.2. Grid Infrastructure [1/5]

Existing resource models and data

  • Compute Resources
  • Commodity clusters [Kee et al., SC’04]
  • Desktop grids resource availability [Kondo et al., FCFS’07]
  • Network Resources
  • Structural generators: GT-ITM [Zegura et al., 1997]
  • Degree-based generators: BRITE [Medina et al., 2001]
  • Storage Resources, other resources
  • ?

Source: H. Casanova

slide-26
SLIDE 26

May 10, 2011

26

3.2. Grid Infrastructure [2/5]

Resource dynamics in cluster-based grids

  • Environment: Grid’5000 traces
  • jobs 05/2004-11/2006 (30 mo., 950K jobs)
  • resource availability traces 05/2005-11/2006 (18 mo., 600K events)
  • Resource availability model for multi-cluster grids

Grid-level availability: 70%

  • A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic

Resource Availability in Grids, Grid 2007, Sep 2007.

slide-27
SLIDE 27

May 10, 2011

27

  • Correlated failure

Maximal set of failures (ordered according to increasing event time),

  • f time parameter in which for any two successive failures E and F,

where returns the timestamp of the event; = 1-3600s.

3.2. Grid Infrastructure [3/5]

Correlated Failures

  • Grid-level view
  • Range: 1-339
  • Average: 11
  • Cluster span
  • Range: 1-3
  • Average: 1.06
  • Failures “stay” within cluster

Size of correlated failures CDF Average Grid-level view

  • A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic

Resource Availability in Grids, Grid 2007, Sep 2007.

slide-28
SLIDE 28

May 10, 2011

28

  • Assume no correlation of failure occurrence between clusters
  • Which site/cluster?
  • fs, fraction of failures at cluster s

MTBF MTTR Correl.

  • Weibull distribution for IAT
  • Shape parameter > 1: increasing hazard rate

the longer a node is online, the higher the chances that it will fail

3.2. Grid Infrastructure [4/5]

Dynamics Model

  • A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic

Resource Availability in Grids, Grid 2007, Sep 2007.

slide-29
SLIDE 29

May 10, 2011

29

3.2. Grid Infrastructure [5/5]

Evolution Model

  • A. Iosup, O. Sonmez, and
  • D. Epema, DGSim:

Comparing Grid Resource Management Architectures through Trace-Based Simulation, Euro-Par 2008.

slide-30
SLIDE 30

May 10, 2011

30

  • Grid workloads very different from those of other

systems, e.g., parallel production envs. (large clusters, supercomputers)

  • Batches of jobs are predominant [Euro-Par’07,HPDC’08]
  • Almost no parallel jobs [Grid’06]
  • Workload model [SC’07, HPDC’08]
  • Clouds? (upcoming)
  • Grid resources are not static
  • Resource dynamics model [Grid’07]
  • Resource evolution model [EuroPar’08]
  • Clouds? [CCGrid’11]
  • Archives: easy to share traces and associated research
slide-31
SLIDE 31

May 10, 2011

31

Outline

  • 1. Introduction and Motivation
  • 2. Q1: Exchange Data

1. The Grid Workloads Archive 2. The Failure Trace Archive 3. The Cloud Workloads Archive (?)

  • 3. Q2: System Characteristics

1. Grid Workloads 2. Grid Infrastructure

  • 4. Q3: System Testing and Evaluation
slide-32
SLIDE 32

May 10, 2011

32

4.1. GrenchMark: Testing in LSDCSs

Analyzing, Testing, and Comparing Systems

  • Use cases for automatically analyzing, testing, and

comparing systems (or middleware)

  • Functionality testing and system tuning
  • Performance testing/analysis of applications
  • Reliability testing of middleware
  • For grids and clouds, this problem is difficult !
  • Testing in real environments is difficult/costly/both
  • Grids/clouds change rapidly
  • Validity and reproducibility of tests
slide-33
SLIDE 33

May 10, 2011

33

4.1. GrenchMark: Testing LSDCSs

Architecture Overview

GrenchMark = Grid Benchmark

slide-34
SLIDE 34

May 10, 2011

34

4.1. GrenchMark: Testing LSDCSs

… Rather Complex

  • Workload structure
  • User-defined and

statistical models

  • Dynamic jobs arrival
  • Burstiness and self-similarity
  • Feedback, background load
  • Machine usage assumptions
  • Users, VOs
  • Metrics
  • A(W) Run/Wait/Resp. Time
  • Efficiency, MakeSpan
  • Failure rate [!]
  • Notions
  • Co-allocation, interactive jobs,

malleable, moldable, …

  • Measurement methods
  • Long workloads
  • Saturated / non-saturated system
  • Start-up, production, and

cool-down scenarios

  • Scaling workload to system
  • Applications
  • Synthetic
  • Real
  • Workload definition language
  • Base language layer
  • Extended language layer
  • Other
  • Can use the same workload for

both simulations and real environments

slide-35
SLIDE 35

May 10, 2011

35

4.1. GrenchMark: Testing LSDCSs

Testing a Large-Scale Environment (1/2)

  • Testing a 1500-processors Condor environment
  • Workloads of 1000 jobs, grouped by 2, 10, 20, 50, 100, 200
  • Test finishes 1h after the last submission
  • Results
  • >150,000 jobs submitted
  • >100,000 jobs successfully run, >2 yr CPU time in 1 week
  • 5% jobs failed (much less than other grids’ average)
  • 25% jobs did not start in time and where cancelled
slide-36
SLIDE 36

May 10, 2011

36

4.1. GrenchMark: Testing LSDCSs

Testing a Large-Scale Environment (2/2)

  • Performance metrics

system-, job-, operational-, application-, and service-level

slide-37
SLIDE 37

May 10, 2011

37

4.1. GrenchMark: Testing in LSDCSs

ServMark: Scalable GrenchMark

  • Blending DiPerF and GrenchMark.
  • Tackles two orthogonal issues:
  • Multi-sourced testing

(multi-user scenarios, scalability)

  • Generate and run dynamic test

workloads with complex structure (real-world scenarios, flexibility)

  • Adds
  • Coordination and automation layers
  • Fault tolerance module

DiPerF GrenchMark ServMark

slide-38
SLIDE 38

May 10, 2011

38

Performance Evaluation of Clouds [1/3]

C-Meter: Cloud-Oriented GrenchMark

Yigitbasi et al.: C-Meter: A Framework for Performance Analysis of Computing Clouds.

  • Proc. of CCGRID 2009
slide-39
SLIDE 39

May 10, 2011

39

Performance Evaluation of Clouds [2/3]

Low Performance for Sci.Comp.

  • Evaluated the performance of resources from four

production, commercial clouds.

  • GrenchMark for evaluating the performance of cloud resources
  • C-Meter for complex workloads
  • Four production, commercial IaaS clouds: Amazon Elastic

Compute Cloud (EC2), Mosso, Elastic Hosts, and GoGrid.

  • Finding: cloud performance low for sci.comp.
  • S. Ostermann et al., A Performance Analysis of EC2 Cloud

Computing Services for Scientific Computing, Cloudcomp 2009, LNICST 34, pp.115–131, 2010.

  • A. Iosup et al.,Performance Analysis of Cloud Computing Services

for Many-Tasks Scientific Computing, IEEE TPDS, vol.22(6), 2011.

slide-40
SLIDE 40

May 10, 2011

40

Performance Evaluation of Clouds [3/3]

Cloud Performance Variability

  • Long-term performance variability of production cloud services
  • IaaS:

Amazon Web Services

  • PaaS:

Google App Engine

  • Year-long performance information for nine services
  • Finding: about half of the cloud services investigated in

this work exhibits yearly and daily patterns; impact of performance variability depends on application.

  • A. Iosup, N. Yigitbasi, and D. Epema, On the Performance

Variability of Production Cloud Services, CCGrid 2011. Amazon S3: GET US HI operations

slide-41
SLIDE 41

May 10, 2011

41

4.2. DGSim: Simulating Multi-Cluster Grids

Goal and Challenges

  • Simulate various grid resource management architectures
  • Multi-cluster grids
  • Grids of grids (THE grid)
  • Challenges
  • Many types of architectures
  • Generating and replaying grid workloads
  • Management of simulations
  • Many repetitions of a simulation for statistical relevance
  • Simulations with many parameters
  • Managing results (e.g., analysis tools)
  • Enabling collaborative experiments

Two GRM architectures

DGSim

slide-42
SLIDE 42

May 10, 2011

42

4.2. DGSim: Simulating Multi-Cluster Grids

Overview

Discrete-Event Simulator

DGSim

slide-43
SLIDE 43

May 10, 2011

43

4.2. DGSim: Simulating Multi-Cluster Grids

Simulated Architectures (Sep 2007)

Hybrid hierarchical/ decentralized Decentralized Hierarchical Independent Centralized

DGSim

  • A. Iosup, D.H.J.Epema, T. Tannenbaum, M.

Farrellee, M. Livny, Inter-Operating Grids through Delegated MatchMaking, SC, 2007.

slide-44
SLIDE 44

May 10, 2011

44

  • GrenchMark+C-Meter: testing large-scale distrib. sys.
  • Framework
  • Testing in real environments performance, reliability, functionality
  • Uniform process: metrics, workloads
  • Real tool available
  • DGSim: simulating multi-cluster grids
  • Many types of architectures
  • Generating and replaying grid workloads
  • Management of the simulations
slide-45
SLIDE 45

May 10, 2011

45

  • Understanding how real systems work
  • Modeling workloads and infrastructure
  • Compare grids and clouds with other platforms (parallel production env.,…)
  • The Archives: easy to share system traces and associated research
  • Grid Workloads Archive
  • Failure Trace Archive
  • Cloud Workloads Archive (upcoming)
  • Testing/Evaluating Grids/Clouds
  • GrenchMark
  • ServMark: Scalable GrenchMark
  • C-Meter: Cloud-oriented GrenchMark
  • DGSim: Simulating Grids (and Clouds?)

Publications Publications 2006 2006: Grid, CCGrid, JSSPP 2007 2007: SC, Grid, CCGrid, … 2008 2008: HPDC, SC, Grid, … 2009 2009: HPDC, CCGrid, … 2010 2010: HPDC, CCGrid (Best Paper Award), EuroPar, … 2011 2011: IEEE TPDS, IEEE Internet Computing, CCGrid, …

slide-46
SLIDE 46

May 10, 2011

46

Thank you for your attention! Questions? Suggestions? Observations?

Alexandru Iosup

A.Iosup@tudelft.nl http://www.pds.ewi.tudelft.nl/~iosup/ (or google “iosup”) Parallel and Distributed Systems Group Delft University of Technology

  • http://www.st.ewi.tudelft.nl/~iosup/research.html
  • http://www.st.ewi.tudelft.nl/~iosup/research_gaming.html
  • http://www.st.ewi.tudelft.nl/~iosup/research_cloud.html

More Info:

Do not hesitate to contact me…