A Group Membership Service for Large-Scale Grids* Fernando Castor - - PowerPoint PPT Presentation

a group membership service for large scale grids
SMART_READER_LITE
LIVE PREVIEW

A Group Membership Service for Large-Scale Grids* Fernando Castor - - PowerPoint PPT Presentation

A Group Membership Service for Large-Scale Grids* Fernando Castor Filho 1,4 , Raphael Y. Camargo 2 , Fabio Kon 3 , and Augusta Marques 4 1 Informatics Center, Federal University of Pernambuco 2 School of Arts, Sciences, and Humanities, University


slide-1
SLIDE 1

A Group Membership Service for Large-Scale Grids*

Fernando Castor Filho1,4, Raphael Y. Camargo2, Fabio Kon3, and Augusta Marques4

1Informatics Center, Federal University of Pernambuco 2School of Arts, Sciences, and Humanities, University of São Paulo 3Department of Computer Science, University of São Paulo 4Department of Computing and Systems, University of Pernambuco

*Supported by CNPq/Brazil, grants #481147/2007-1 and #550895/2007-8

slide-2
SLIDE 2

Middleware'2008 Workshop on Middleware for Grid Computing. Brussels, Belgium, December 1st, 2008 2

Faults in Grids

 Important problem

 Waste computing and network resources  Waste time (resources might need to be reserved again)

 Scale worsens matters

 Failures become common events

 Opportunistic grids

 Shared grid infrastructure  Nodes leave/fail frequently

 Fault tolerance can allow for more efficient use of the

grid

slide-3
SLIDE 3

Middleware'2008 Workshop on Middleware for Grid Computing. Brussels, Belgium, December 1st, 2008 3

Achieving Fault Tolerance

 Fist step: detecting failures...  And then doing something about them  Other grid nodes must also be aware

 Otherwise, progress might be hindered

 More generally: each node should have an up-

to-date view of group membership

 In terms of correct and faulty processes

slide-4
SLIDE 4

Middleware'2008 Workshop on Middleware for Grid Computing. Brussels, Belgium, December 1st, 2008 4

Requirements for Group Membership in Grids

1 Scalability 2 Autonomy 3 Efficiency 4 Capacity of handling dynamism 5 Platform-independence 6 Distribution (decentralization) 7 Ease of use

slide-5
SLIDE 5

Middleware'2008 Workshop on Middleware for Grid Computing. Brussels, Belgium, December 1st, 2008 5

Our Proposal

 A group membership service that addresses the

aforementioned requirements

 Very lightweight  Assuming a crash-recovery fault model  Deployable in any platform that has an ANSI C

compiler

 Leveraging recent advances in

 Gossip/infection-style information dissemination  Accrual failure detectors

slide-6
SLIDE 6

Middleware'2008 Workshop on Middleware for Grid Computing. Brussels, Belgium, December 1st, 2008 6

Gossip/Infection-Style Information Dissemination

 Based on the way infectious diseases spread

 Or, alternatively, on how gossip is disseminated

 Periodically, each participant randomly infects

some of its neighbors

 Infects = passes information that (potentially)

modifies its state

 Weakly-consistent protocols

 Sufficient for several practical applications

 Highly scalable and robust

slide-7
SLIDE 7

Middleware'2008 Workshop on Middleware for Grid Computing. Brussels, Belgium, December 1st, 2008 7

Accrual Failure Detectors

 Decouple monitoring and interpretation  Output values on a continuous scale

 Suspicion level  Eventually strongly accurate failure detectors

 Heartbeat interarrival times define a probability

distribution function

 Several thresholds can be set

 Each triggering different actions

 As good as “regular” adaptive FDs

 More flexible and easier to use

slide-8
SLIDE 8

Middleware'2008 Workshop on Middleware for Grid Computing. Brussels, Belgium, December 1st, 2008 8

Architecture of the Group Membership Service

Failure Handler 1 … Monitored process Information Dissemination Membership Management Monitor Accrual failure detector Failure Detector Failure Handler 2 Failure Handler N Node1 Node3 Node4 Node2 Each computer runs an instance of the group membership service

slide-9
SLIDE 9

Middleware'2008 Workshop on Middleware for Grid Computing. Brussels, Belgium, December 1st, 2008 9

Membership Management

 Handles membership requests  Disseminates information about

new members

 Informs them about existing members  Removes failed members from the group  Failed processes can also rejoin

 Epoch mechanism  Only 32 extra bits in each heartbeat message

Failure Handler 1 … Monitored process Information Dissemination Membership Management Monitor Accrual failure detector Failure Detector Failure Handler 2 Failure Handler N

slide-10
SLIDE 10

Middleware'2008 Workshop on Middleware for Grid Computing. Brussels, Belgium, December 1st, 2008 10

Failure Detector

 Collects data about k processes

 Push heartbeats  Gossiped periodically (Thb)  if p1 monitors p2 then there is a TCP connection between them

 Accrual Failure Detector

 Keeps track of the last m interarrival times for a given process  Derives a probability that a process

has failed

 Calculation is performed in O(log|S|) steps

Failure Handler 1 … Monitored process Information Dissemination Membership Management Monitor Accrual failure detector Failure Detector Failure Handler 2 Failure Handler N

10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

slide-11
SLIDE 11

Middleware'2008 Workshop on Middleware for Grid Computing. Brussels, Belgium, December 1st, 2008 11

Collecting Enough Information

 Adaptive FDs need to receive

information about monitored processes regularly

 Also applies to accrual FDs  Traditional gossip protocols are not regular

 Solution: persistent monitoring relationships between

processes

 Established randomly  Exhibit the desired properties of gossip protocols

Failure Handler 1 … Monitored process Information Dissemination Membership Management Monitor Accrual failure detector Failure Detector Failure Handler 2 Failure Handler N

slide-12
SLIDE 12

Middleware'2008 Workshop on Middleware for Grid Computing. Brussels, Belgium, December 1st, 2008 12

Failure Handlers

 For each monitored process,

a set of thresholds is set

 For example: 85, 90, and 95%  A handler is associated to each one

 Several handling strategies are possible

 Each executed when the corresponding threshold is

reached

 It is easy to define application-specific handlers

Failure Handler 1 … Monitored process Information Dissemination Membership Management Monitor Accrual failure detector Failure Detector Failure Handler 2 Failure Handler N

slide-13
SLIDE 13

Middleware'2008 Workshop on Middleware for Grid Computing. Brussels, Belgium, December 1st, 2008 13

Information Dissemination

 Responsible for gossiping

information

 About failed nodes (specific messages)

 Important for failure handling

 About correct members (piggybacked in heartbeat messages)

 Dissemination speed is based on parameter j

 j should be O(log(N))

Failure Handler 1 … Monitored process Information Dissemination Membership Management Monitor Accrual failure detector Failure Detector Failure Handler 2 Failure Handler N

slide-14
SLIDE 14

Middleware'2008 Workshop on Middleware for Grid Computing. Brussels, Belgium, December 1st, 2008 14

Implementation

 Written in Lua

 Compact, efficient, extensible, and platform-

independent

 The service is packaged as a reusable Lua module

 Uses a lightweight CORBA ORB (OiL) for IPC

 Also written in Lua

 Approximately 80KB of souce code

slide-15
SLIDE 15

Middleware'2008 Workshop on Middleware for Grid Computing. Brussels, Belgium, December 1st, 2008 15

Initial Evaluation

 Main goal: to assess scalability and resilience to

failures

 20-140 concurrent nodes

 Distributed accross three machines equipped with 1GB RAM

 100Mbps Fast Ethernet Network  Emulated WAN

 latency = 500ms and jitter = 250ms

 Parameters Thb = 2s, k = 4, j = 6,

slide-16
SLIDE 16

Middleware'2008 Workshop on Middleware for Grid Computing. Brussels, Belgium, December 1st, 2008 16

Initial Evaluation

 Two situations:

 When no failures occur

 20, 40, 60, 80, 100, 120, 140 processes

 When processes fail, including realistically large

numbers of simultaneous failures

 140 processes  10, 20, 30, and 40% of failures

 Number of sent messages per process as a measure

  • f scalability
slide-17
SLIDE 17

Middleware'2008 Workshop on Middleware for Grid Computing. Brussels, Belgium, December 1st, 2008 17

Scenario 1: No failures

slide-18
SLIDE 18

Middleware'2008 Workshop on Middleware for Grid Computing. Brussels, Belgium, December 1st, 2008 18

Scenario 2: 10-40% of process failures

No process became isolated.

Almost 95% were still monitored by at least k – 1 processes

slide-19
SLIDE 19

Middleware'2008 Workshop on Middleware for Grid Computing. Brussels, Belgium, December 1st, 2008 19

Scenario 2: 40% of process failures

slide-20
SLIDE 20

Middleware'2008 Workshop on Middleware for Grid Computing. Brussels, Belgium, December 1st, 2008 20

Concluding Remarks

 Main contribution: to combine gossip-based information

dissemination and accrual FDs

 while guaranteeing that the AFD collects enough information ;  scalably; and  in a timely and fault-tolerant way

 Ongoing work:

 More experiments  Self-organizing for better resilience and better scalability  Periodic dissemination of failure information

slide-21
SLIDE 21

Middleware'2008 Workshop on Middleware for Grid Computing. Brussels, Belgium, December 1st, 2008 21

Thank You!

Contact: Fernando Castor fcastor@acm.org