fault tolerance for pastrygrid middleware
play

Fault-Tolerance for PastryGrid Middleware erin 1 , Heithem Abbes 1 , - PowerPoint PPT Presentation

Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion Fault-Tolerance for PastryGrid Middleware erin 1 , Heithem Abbes 1 , 2 , Mohamed Jemni 2 , Yazid Christophe C Missaoui 2 1 LIPN, Universit e de Paris XIII, CNRS UMR 7030,


  1. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion Fault-Tolerance for PastryGrid Middleware erin 1 , Heithem Abbes 1 , 2 , Mohamed Jemni 2 , Yazid Christophe C´ Missaoui 2 1 LIPN, Universit´ e de Paris XIII, CNRS UMR 7030, France 2 UTIC, ESSTT, Universit´ e de Tunis, Tunisia HPGC’10 - IPDPS

  2. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion Outlines Introduction 1 PastryGrid 2 Fault Tolerance in PastryGrid 3 Conclusion 4

  3. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion Desktop Grid Architectures Desktop Grid Key Points Federation of thousand of nodes; !"#$%&'()&*#+,"%(+%-#( Internet as the 3 !"#$%&'()"*+&%,-($",$.%" /0#0'1$-(2."+&%,-($",$.%" communication layer: no 45"%+3+6*7(#+(#$"%8&," ")*&+',#--)*.#'*/+, "//$4*+#'/$1 !#$#%(0,1$&(2)'(0 6>>'(,&$(0# 3&(/2$.&,5*(.0 ?,-"*.'"% !#$#%&'&$( trust! =&5@+3+A&$&+3+<"$+ 3&(2)'( B?+3+?&#*C0D !" E%0$0,0'5 Volatility; local IP; Firewall !" 9(%":&'';<6= ! "#$%&!'()*+,-!-)(./ 0

  4. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion Desktop Grid Architectures Desktop Grid Future Generation (in 2006) Distributed Architecture Architecture with !"#$%&'("%')*#+,-"#-.*" modularity: every = !"#$%&'()"*+&%,-($",$.%" :8#8';$-(<."+&%,-($",$.%" /01'($+$&0203*&$&+45#$6 7#$"%+#8*"+,8409 >0"%+=+?*4(#+(#$"%@&," component is ?11'(,&$(8# ")*&+', "//$4*+#'/$1 #--)*.#'*/+, 5.6&42)&$,78#(9(: A,-"*.'"% !#$#%(0,1 “configurable”: scheduler, B&02+=+C&$&+=+D"$+ $&(2)'(0 EA+=+A&#*F8G !#$#%&'&$( !" H%8$8,8'0 3&(2)'( storage, transport protocole ;#'#,<#+#=&$ 5.6&42)&$,78#(9(: I(%"J&''3D?B ! "#$%&!'()*+,-!-)(./ & Direct communications between peers; Security; Applications coming from any sciences (e-Science applications)

  5. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion In search of distributed architecture PastryGrid An approach based on structured overlay network to discover (on the fly) the next node executing the next task

  6. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion In search of distributed architecture PastryGrid An approach based on structured overlay network to discover (on the fly) the next node executing the next task Decentralizes the execution of a distributed application with precedences between tasks

  7. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid’s overview Main objectives Fully distributed execution of task graph;

  8. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid’s overview Main objectives Fully distributed execution of task graph; Distributed resource management;

  9. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid’s overview Main objectives Fully distributed execution of task graph; Distributed resource management; Distributed coordination;

  10. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid’s overview Main objectives Fully distributed execution of task graph; Distributed resource management; Distributed coordination; Dynamically creation of an execution environment;

  11. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid’s overview Main objectives Fully distributed execution of task graph; Distributed resource management; Distributed coordination; Dynamically creation of an execution environment; No central element;

  12. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid’s overview Main objectives Fully distributed execution of task graph; Distributed resource management; Distributed coordination; Dynamically creation of an execution environment; No central element;

  13. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid’s Terminology Task terminology Friend tasks: T 2 , T 3 share the same successor ( T 6 )

  14. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid’s Terminology Task terminology Friend tasks: T 2 , T 3 share the same successor ( T 6 ) Shared tasks T 6 : has n > 1 ancestors ( T 2 , T 3 )

  15. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid’s Terminology Task terminology Friend tasks: T 2 , T 3 share the same successor ( T 6 ) Shared tasks T 6 : has n > 1 ancestors ( T 2 , T 3 ) Isolated tasks T 4 , T 5 : have a single ancestor

  16. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid’s Terminology Task terminology Example Friend tasks: T 2 , T 3 share the same successor ( T 6 ) Shared tasks T 6 : has n > 1 ancestors ( T 2 , T 3 ) Isolated tasks T 4 , T 5 : have a single ancestor

  17. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid components Addressing scheme to identify applications and users (based on haching application name + submission date + user name — DHT (Pastry))

  18. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid components Addressing scheme to identify applications and users (based on haching application name + submission date + user name — DHT (Pastry)) Protocol of resource discovering; No dedicated nodes for the search of the next node to use → on the fly! Optimization: the machine that terminates the last starts the search.

  19. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid components Addressing scheme to identify applications and users (based on haching application name + submission date + user name — DHT (Pastry)) Protocol of resource discovering; No dedicated nodes for the search of the next node to use → on the fly! Optimization: the machine that terminates the last starts the search. Rendez-vous concept (RDV); Objectives: localisation of a node without IP; task coordination; data recovery;

  20. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid components Addressing scheme to identify applications and users (based on haching application name + submission date + user name — DHT (Pastry)) Protocol of resource discovering; No dedicated nodes for the search of the next node to use → on the fly! Optimization: the machine that terminates the last starts the search. Rendez-vous concept (RDV); Objectives: localisation of a node without IP; task coordination; data recovery; coordination protocol between machines participating in the application.

  21. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion PastryGrid components Addressing scheme to identify applications and users (based on haching application name + submission date + user name — DHT (Pastry)) Protocol of resource discovering; No dedicated nodes for the search of the next node to use → on the fly! Optimization: the machine that terminates the last starts the search. Rendez-vous concept (RDV); Objectives: localisation of a node without IP; task coordination; data recovery; coordination protocol between machines participating in the application.

  22. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion RDV Concept Coordinator Known at the beginning;

  23. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion RDV Concept Coordinator Known at the beginning; Central element on a decicated place;

  24. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion RDV Concept Coordinator Known at the beginning; Central element on a decicated place; Failure: the system crashes;

  25. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion RDV Concept Coordinator Known at the beginning; Central element on a decicated place; Failure: the system crashes; Centralized resource management;

  26. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion RDV Concept Coordinator Known at the beginning; Central element on a decicated place; Failure: the system crashes; Centralized resource management; Management of all applications (overload)

  27. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion RDV Concept Coordinator RDV Known at the beginning; Unknown; Central element on a decicated place; Failure: the system crashes; Centralized resource management; Management of all applications (overload)

  28. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion RDV Concept Coordinator RDV Known at the beginning; Unknown; Central element on a Variable; decicated place; Failure: the system crashes; Centralized resource management; Management of all applications (overload)

  29. Introduction PastryGrid Fault Tolerance in PastryGrid Conclusion RDV Concept Coordinator RDV Known at the beginning; Unknown; Central element on a Variable; decicated place; Failure: may still run; Failure: the system crashes; Centralized resource management; Management of all applications (overload)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend