Fault-Tolerance for PastryGrid Middleware erin 1 , Heithem Abbes 1 , - PowerPoint PPT Presentation

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion Fault-Tolerance for PastryGrid Middleware erin 1 , Heithem Abbes 1 , 2 , Mohamed Jemni 2 , Yazid Christophe C´ Missaoui 2 1 LIPN, Universit´ e de Paris XIII, CNRS UMR 7030, France 2 UTIC, ESSTT, Universit´ e de Tunis, Tunisia HPGC’10 - IPDPS

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion Outlines Introduction 1 PastryGrid 2 Fault Tolerance in PastryGrid 3 Conclusion 4

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion Desktop Grid Architectures Desktop Grid Key Points Federation of thousand of nodes; !"#$%&'()&*#+,"%(+%-#( Internet as the 3 !"#$%&'()"*+&%,-($",$.%" /0#0'1$-(2."+&%,-($",$.%" communication layer: no 45"%+3+6*7(#+(#$"%8&," ")*&+',#--)*.#'*/+, "//$4*+#'/$1 !#$#%(0,1$&(2)'(0 6>>'(,&$(0# 3&(/2$.&,5*(.0 ?,-"*.'"% !#$#%&'&$( trust! =&5@+3+A&$&+3+<"$+ 3&(2)'( B?+3+?&#*C0D !" E%0$0,0'5 Volatility; local IP; Firewall !" 9(%":&'';<6= ! "#$%&!'()*+,-!-)(./ 0

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion Desktop Grid Architectures Desktop Grid Future Generation (in 2006) Distributed Architecture Architecture with !"#$%&'("%')*#+,-"#-.*" modularity: every = !"#$%&'()"*+&%,-($",$.%" :8#8';$-(<."+&%,-($",$.%" /01'($+$&0203*&$&+45#$6 7#$"%+#8*"+,8409 >0"%+=+?*4(#+(#$"%@&," component is ?11'(,&$(8# ")*&+', "//$4*+#'/$1 #--)*.#'*/+, 5.6&42)&$,78#(9(: A,-"*.'"% !#$#%(0,1 “configurable”: scheduler, B&02+=+C&$&+=+D"$+ $&(2)'(0 EA+=+A&#*F8G !#$#%&'&$( !" H%8$8,8'0 3&(2)'( storage, transport protocole ;#'#,<#+#=&$ 5.6&42)&$,78#(9(: I(%"J&''3D?B ! "#$%&!'()*+,-!-)(./ & Direct communications between peers; Security; Applications coming from any sciences (e-Science applications)

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion In search of distributed architecture PastryGrid An approach based on structured overlay network to discover (on the fly) the next node executing the next task

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion In search of distributed architecture PastryGrid An approach based on structured overlay network to discover (on the fly) the next node executing the next task Decentralizes the execution of a distributed application with precedences between tasks

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid’s overview Main objectives Fully distributed execution of task graph;

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid’s overview Main objectives Fully distributed execution of task graph; Distributed resource management;

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid’s overview Main objectives Fully distributed execution of task graph; Distributed resource management; Distributed coordination;

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid’s overview Main objectives Fully distributed execution of task graph; Distributed resource management; Distributed coordination; Dynamically creation of an execution environment;

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid’s overview Main objectives Fully distributed execution of task graph; Distributed resource management; Distributed coordination; Dynamically creation of an execution environment; No central element;

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid’s Terminology Task terminology Friend tasks: T 2 , T 3 share the same successor ( T 6 )

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid’s Terminology Task terminology Friend tasks: T 2 , T 3 share the same successor ( T 6 ) Shared tasks T 6 : has n > 1 ancestors ( T 2 , T 3 )

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid’s Terminology Task terminology Friend tasks: T 2 , T 3 share the same successor ( T 6 ) Shared tasks T 6 : has n > 1 ancestors ( T 2 , T 3 ) Isolated tasks T 4 , T 5 : have a single ancestor

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid’s Terminology Task terminology Example Friend tasks: T 2 , T 3 share the same successor ( T 6 ) Shared tasks T 6 : has n > 1 ancestors ( T 2 , T 3 ) Isolated tasks T 4 , T 5 : have a single ancestor

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid components Addressing scheme to identify applications and users (based on haching application name + submission date + user name — DHT (Pastry))

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid components Addressing scheme to identify applications and users (based on haching application name + submission date + user name — DHT (Pastry)) Protocol of resource discovering; No dedicated nodes for the search of the next node to use → on the fly! Optimization: the machine that terminates the last starts the search.

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid components Addressing scheme to identify applications and users (based on haching application name + submission date + user name — DHT (Pastry)) Protocol of resource discovering; No dedicated nodes for the search of the next node to use → on the fly! Optimization: the machine that terminates the last starts the search. Rendez-vous concept (RDV); Objectives: localisation of a node without IP; task coordination; data recovery;

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid components Addressing scheme to identify applications and users (based on haching application name + submission date + user name — DHT (Pastry)) Protocol of resource discovering; No dedicated nodes for the search of the next node to use → on the fly! Optimization: the machine that terminates the last starts the search. Rendez-vous concept (RDV); Objectives: localisation of a node without IP; task coordination; data recovery; coordination protocol between machines participating in the application.

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion RDV Concept Coordinator Known at the beginning;

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion RDV Concept Coordinator Known at the beginning; Central element on a decicated place;

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion RDV Concept Coordinator Known at the beginning; Central element on a decicated place; Failure: the system crashes;

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion RDV Concept Coordinator Known at the beginning; Central element on a decicated place; Failure: the system crashes; Centralized resource management;

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion RDV Concept Coordinator Known at the beginning; Central element on a decicated place; Failure: the system crashes; Centralized resource management; Management of all applications (overload)

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion RDV Concept Coordinator RDV Known at the beginning; Unknown; Central element on a decicated place; Failure: the system crashes; Centralized resource management; Management of all applications (overload)

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion RDV Concept Coordinator RDV Known at the beginning; Unknown; Central element on a Variable; decicated place; Failure: the system crashes; Centralized resource management; Management of all applications (overload)

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion RDV Concept Coordinator RDV Known at the beginning; Unknown; Central element on a Variable; decicated place; Failure: may still run; Failure: the system crashes; Centralized resource management; Management of all applications (overload)

Fault-Tolerance for PastryGrid Middleware erin 1 , Heithem Abbes 1 , - PowerPoint PPT Presentation

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion Fault-Tolerance for PastryGrid Middleware erin 1 , Heithem Abbes 1 , 2 , Mohamed Jemni 2 , Yazid Christophe C Missaoui 2 1 LIPN, Universit e de Paris XIII, CNRS UMR 7030,

Distributed Systems 5. Fault Tolerant Systems Fault-Tolerance - 1 Lszl Bszrmnyi

Lecture 10: Fault Tolerance Fault Tolerant Concurrent Computing The main principles of fault

Adaptability and Fault Tolerance Adaptability and Fault Tolerance Rog rio rio de Lemos de

General Principles of Fault- Tolerance Daniel Gottesman Perimeter Institute Whats Left For

Roadmap for Section 10.1 The Notion of Fault-Tolerance Fault-Tolerance Support in NTFS Volume

Middleware Chapter 2: Contents - Chapter 2 Understanding middleware Middleware as a

Challenging Malicious Inputs with Fault Tolerance Techniques Bruno Luiz Agenda Threats

Fault Tolerance at Speed Todd L. Montgomery @toddlmontgomery About me What type of Fault

Rigorous fault-tolerance thresholds Ben Reichardt UC Berkeley N gate circuit 0/1 N gate

Fault Tolerance and Robustness in Concurrent Systems Faults, errors, failures, and fault

Entity Resolution: Glue for Middleware Hector Garcia-Molina Stanford University Middleware

CSci 5105 Introduction to Distributed Systems Fault Tolerance Last Time Replication and

Fault Tolerance in Message Passing Fault Tolerance in Message Passing and in Action and in

No SQL? Image credit: http://browsertoolkit.com/fault-tolerance.png No SQL? Image credit:

Fibre bundle framework for unitary quantum fault tolerance Lucy Liuxuan Zhang University of

Towards an Efficient Fault-Tolerance Scheme for GLB Claudia Fohry, Marco Bungart and Jonas Posner

Motivation Large-scale distributed systems becoming more common multiple datacenters, cloud

SCRIBE A Large-Scale and Decentralised Application-Level Multicast Infrastructure Joo Nogueira

Ekta: An Efficient DHT Substrate for Distributed Applications in Mobile Ad Hoc Networks

Tapestry: A Resilient Global-scale Overlay for What have we seen before? Service Deployment

Comparison and Evaluation of Application Level Multicast for Mobile Networks Ingo Juchem Email:

11/10/08 Today P561: Network Systems Finding content and services Week 7: Finding content

inDecentralizedOnlineSocialNetworks OnlineSocialNetworks(OSNs)

Peer-to-Peer Networks Distribution Decentralized control Self-organization Outline

Sambuz

Useful Links

Newsletter

Mail Us