2010 Computing on Grids and Supercomputers Improving Many-Task - PowerPoint PPT Presentation

MTAGS 3rd IEEE Workshop on Many-Task 2010 Computing on Grids and Supercomputers Improving Many-Task Computing in Scientific Workflows Using P2P Techniques Jonas Dias Eduardo Ogasawara Daniel de Oliveira Esther Pacitti Marta Mattoso COPPE, Federal University of Rio de Janeiro, Brazil INRIA & LIRMM, Montpellier, France

MTAGS 2010 Introduction Pre-processing • Scientific Experiments • Petascale Computing – Behavior of hundreds of thousands processors – Parallel Execution failures Execution Kernel • Scientific Workflows – Represent the chaining of activities of an experiment Pos-processing – Scientific Workflow Management Systems (SWfMS) Typical Scientific Workflow Improving Many-Task Computing in 11/15/2010 2 Scientific Workflows Using P2P Techniques

MTAGS 2010 Experiment Execution • The same workflow may run several times – 5000 parameter combinations to try – 3 workflow variations – Total of 15000 instances to be executed • Motivation to parallelize – Accomplish the results timely – Clusters, Grids and Clouds • Utility Computing model – Give the answer when they are still necessary Improving Many-Task Computing in 11/15/2010 3 Scientific Workflows Using P2P Techniques

MTAGS 2010 Difficulties in Workflow Parallelism • MPI – Complex and legacy codes – Dynamic resource management – A job’s process may fail • Compromise the whole execution • Resubmitting relies on the scientist manual control – Not feasible for a huge number of tasks • Grid Schedulers – Submit many Jobs simultaneously – Waiting time on resource management queues Improving Many-Task Computing in 11/15/2010 4 Scientific Workflows Using P2P Techniques

MTAGS 2010 MTC Workflow Parallelism • Many-task computing (MTC) – Improve Parameter Sweep and Data Parallelism • HPC Cluster Systems – Not very easy to setup Jobs to be submitted – Centralized control – Compute nodes may fail • Open Issues – Best approaches to setup an experiment execution – Load balancing – Dynamic resource management – Control the failures • What has failed and needs to be rescheduled ? Improving Many-Task Computing in 11/15/2010 5 Scientific Workflows Using P2P Techniques

MTAGS 2010 MTC, Workflows and Clusters • The Heracles Approach – Approach to execute workflow activities • More transparent setup • Load Balancing • Quality of service • Distributed Provenance Gathering – Uses the P2P model • To be implemented in a cluster scheduler • Not P2P infrastructure Improving Many-Task Computing in 11/15/2010 6 Scientific Workflows Using P2P Techniques

MTAGS 2010 Heracles Overview Scientific Workflow Management System Workflow MTC Heracles Scheduler Cluster Improving Many-Task Computing in 11/15/2010 7 Scientific Workflows Using P2P Techniques

MTAGS 2010 Heracles Structure SWfMS Improving Many-Task Computing in 11/15/2010 8 Scientific Workflows Using P2P Techniques

MTAGS 2010 Heracles Structure SWfMS Workflow MTC Scheduler Workflow Instances Wrapper Cluster Scheduling Improving Many-Task Computing in 11/15/2010 9 Scientific Workflows Using P2P Techniques

MTAGS 2010 Heracles Structure SWfMS Workflow MTC Scheduler Heracles Workflow Task Instances Wrapper Cluster Scheduling Improving Many-Task Computing in 11/15/2010 10 Scientific Workflows Using P2P Techniques

MTAGS 2010 Heracles Structure SWfMS Workflow MTC Scheduler Executer Distributed Task Heracles Workflow Table Task Overlay Task Instances Wrapper Handler Task Execution Cluster Scheduling Monitoring Heracles Process Improving Many-Task Computing in 11/15/2010 11 Scientific Workflows Using P2P Techniques

MTAGS 2010 Heracles Structure SWfMS Workflow MTC Scheduler Executer Distributed Task Heracles Workflow Table Task Overlay Task Instances Wrapper Handler Task Execution Process Cluster Scheduling Monitoring Heracles Process Improving Many-Task Computing in 11/15/2010 12 Scientific Workflows Using P2P Techniques

MTAGS 2010 Heracles Structure SWfMS Workflow MTC Scheduler Executer Distributed Task Heracles Workflow Table Task Overlay Task Instances Wrapper Handler Task Execution Process Cluster Scheduling Monitoring Heracles Process Node Process Resource Node Process Manager Node Process Cluster Node Process Improving Many-Task Computing in 11/15/2010 13 Scientific Workflows Using P2P Techniques

MTAGS 2010 P2P view Heracles virtual P2P network view Node Process Process Resource Node Process Process Manager Node Process Process Cluster Node Process Process Improving Many-Task Computing in 11/15/2010 14 Scientific Workflows Using P2P Techniques

MTAGS 2010 Heracles Improving Many-Task Computing in 11/15/2010 15 Scientific Workflows Using P2P Techniques

MTAGS 2010 Transparency • Setup the deadline , not the number of nodes • Heracles controls the number of involved nodes – Execution partial efficiency – Automatically refresh the number of necessary processors Improving Many-Task Computing in 11/15/2010 16 Scientific Workflows Using P2P Techniques

MTAGS 2010 Dynamic Scheduling example 173 tasks per 200 hour 180 160 140 120 100 64 cores 80 60 40 20 0 0 5 10 15 20 Hours Completed tasks per hour Processing Cores Improving Many-Task Computing in 11/15/2010 17 Scientific Workflows Using P2P Techniques

MTAGS 2010 Efficiency 1 0.8 0.6 0.4 0.2 0 0 5 10 15 20 Hours Improving Many-Task Computing in 11/15/2010 18 Scientific Workflows Using P2P Techniques

MTAGS 2010 Load Balancing • Clusters depend on the head node control. • Tasks can have their autonomy – Like P2P dynamic control • Hierarchical organization – Based on P2P hierarchical networks – Group leaders – Working nodes Improving Many-Task Computing in 11/15/2010 19 Scientific Workflows Using P2P Techniques

MTAGS 2010 Quality of Service • Job’s process failure – Hard to reschedule on traditional approaches – Manual reschedule not feasible – How to address it in the provenance collection • P2P model can help – Autonomy of the nodes – Unfinished or failed tasks can be rescheduled – Provenance may register all execution attempts or the last execution attempt Improving Many-Task Computing in 11/15/2010 20 Scientific Workflows Using P2P Techniques

MTAGS 2010 When rescheduling? • Group leaders are responsible for the decision – Distributed table data • Status of the tasks on the distributed table – Pending, running or finished • Average execution time of a task • To reschedule means to change the status of the task to pending Improving Many-Task Computing in 11/15/2010 21 Scientific Workflows Using P2P Techniques

MTAGS 2010 Case Study • Analyze the impact of churn events on tasks execution on clusters – Many workflow activities to be executed – Activities are decomposed into tasks • Suffer with churn events – Activities producing 512, 1024, 2048 and 4096 tasks – Tasks is classified as small, medium and large – Seven days simulated – Calibrated using real experiment data Improving Many-Task Computing in 11/15/2010 22 Scientific Workflows Using P2P Techniques

MTAGS 2010 Rescheduling Types • Manual Rescheduling – Scientists checks activity status every twelve hours – If a failure happens, all the tasks of the activity are rescheduled • Automatic Rescheduling – Only the task that has failed is rescheduled Improving Many-Task Computing in 11/15/2010 23 Scientific Workflows Using P2P Techniques

MTAGS 2010 Small Tasks Improving Many-Task Computing in 11/15/2010 24 Scientific Workflows Using P2P Techniques

MTAGS 2010 Medium Tasks Improving Many-Task Computing in 11/15/2010 25 Scientific Workflows Using P2P Techniques

MTAGS 2010 Big Tasks Improving Many-Task Computing in 11/15/2010 26 Scientific Workflows Using P2P Techniques

MTAGS 2010 Conclusions • Empowering scientific experiments execution – Scientific Workflow parallelization on huge clusters – Many task computing – Process failures, poor load balancing, usability issues • Heracles Approach – Transparency, load balance and quality of service – Using P2P model on clusters • Case study showed the gains with automatic rescheduling Improving Many-Task Computing in 11/15/2010 27 Scientific Workflows Using P2P Techniques

MTAGS 2010 Future Work • Analyze the advantages that MTC schedulers can achieve when using full Heracles approach • Using Heracles on real experiments – Implementing it on real schedulers such as Hydra • Evaluate other fault tolerant mechanisms such as redundant executions Improving Many-Task Computing in 11/15/2010 28 Scientific Workflows Using P2P Techniques

MTAGS 2010 Acknowledgements A P2P Approach to Many Tasks Computing 6/24/2010 29 for Scientific Workflows

MTAGS 3rd IEEE Workshop on Many-Task 2010 Computing on Grids and Supercomputers Improving Many-Task Computing in Scientific Workflows Using P2P Techniques COPPE, Federal University of Rio de Janeiro, Brazil INRIA & LIRMM, Montpellier, France

2010 Computing on Grids and Supercomputers Improving Many-Task - PowerPoint PPT Presentation

MTAGS 3rd IEEE Workshop on Many-Task 2010 Computing on Grids and Supercomputers Improving Many-Task Computing in Scientific Workflows Using P2P Techniques Jonas Dias Eduardo Ogasawara Daniel de Oliveira Esther Pacitti Marta Mattoso COPPE,

Eye and Brain Eye and Brain Central visual pathways 1 2/22/2010 2 2/22/2010 3 2/22/2010 4

I.M. Skaugen SE 3Q 2010 presentation IMS Innovative Maritime Solutions 15 October 2010 1

Financial Results for 4/2010- -9/2010 9/2010 Financial Results for 4/2010 and and Financial

2010 Interim Results 2010 Interim Results 12 August 2010 2010 Interim Results 2010 Interim

Amazing Android How We Built Square Friday, May 14, 2010 Friday, May 14, 2010 Friday, May 14,

2010 half-year results 2010 half-year results 26 July 2010 26 July 2010 26 July 2010 26 July

1 2 Monday, October 25, 2010 3 4 Monday, October 25, 2010 5 6 Monday, October 25, 2010 7

From Dev To Production Sam Newman QCon London 2010 Wednesday, 10 March 2010 Wednesday, 10 March

Fourth Quarter 2010 Results Fourth Quarter 2010 Results Fourth Quarter 2010 Results Fourth

CO CONF NFERE RENCE NCE CA CALL Q3 3 2010 2010 Rev Reven enue Q3 2010 Highlights Fas

2010 2010 LEVY HEARING LEVY HEARING December 14, 2010 Meeting of the Board NORTH SHORE SCHOOL

Results presentation for the year ended 30 June 2010 8 September 2010 1 1 De Vessey Village,

2010 Census Operational Press Briefing April 28, 2010 2000 and 2010 Average Mailback Participation

2010 Full Year Result 2010 Full Year Result 23 February 2011 2010 Full Year Result 2010 Full

AFRICACRYPT 2010 Call for Papers AFRICACRYPT 2010 Call for Papers Africacrypt 2010 You are

2010 2010 .............................................. London 19 May 2010 London, 19 May 2010

Privacy Issues in Cloud computing Zeeshan Ali Shah System administrator PhD researcher KTH PDC

Infrastructure Technologies for Large- Scale Service-Oriented Systems Kostas Magoutis

Cloud Computing & Scalability Reid Holmes REID HOLMES - CPSC 410: ADVANCED SOFTWARE

OVERVIEW 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2 Overview

From Development to Production: Many Uses of Serverless Observability EMRAH SAMDAN | SEPTEMBER

Announcements CS 4100: Artificial Intelligence Markov Decision Processes II Homework k 4:

Oracles in TTCN-3 and UTP Ina Schieferdecker 2012, May 22nd, CREST Workshop, London Outline

Estimating the Specific Indirect Effect for Multiple Types of Correspondence Audit DISCUSSED BY:

2010 Computing on Grids and Supercomputers Improving Many-Task - PowerPoint PPT Presentation

MTAGS 3rd IEEE Workshop on Many-Task 2010 Computing on Grids and Supercomputers Improving Many-Task Computing in Scientific Workflows Using P2P Techniques Jonas Dias Eduardo Ogasawara Daniel de Oliveira Esther Pacitti Marta Mattoso COPPE,

Eye and Brain Eye and Brain Central visual pathways 1 2/22/2010 2 2/22/2010 3 2/22/2010 4

I.M. Skaugen SE 3Q 2010 presentation IMS Innovative Maritime Solutions 15 October 2010 1

Financial Results for 4/2010- -9/2010 9/2010 Financial Results for 4/2010 and and Financial

2010 Interim Results 2010 Interim Results 12 August 2010 2010 Interim Results 2010 Interim

Amazing Android How We Built Square Friday, May 14, 2010 Friday, May 14, 2010 Friday, May 14,

2010 half-year results 2010 half-year results 26 July 2010 26 July 2010 26 July 2010 26 July

1 2 Monday, October 25, 2010 3 4 Monday, October 25, 2010 5 6 Monday, October 25, 2010 7

From Dev To Production Sam Newman QCon London 2010 Wednesday, 10 March 2010 Wednesday, 10 March

Fourth Quarter 2010 Results Fourth Quarter 2010 Results Fourth Quarter 2010 Results Fourth

CO CONF NFERE RENCE NCE CA CALL Q3 3 2010 2010 Rev Reven enue Q3 2010 Highlights Fas

2010 2010 LEVY HEARING LEVY HEARING December 14, 2010 Meeting of the Board NORTH SHORE SCHOOL

Results presentation for the year ended 30 June 2010 8 September 2010 1 1 De Vessey Village,

2010 Census Operational Press Briefing April 28, 2010 2000 and 2010 Average Mailback Participation

2010 Full Year Result 2010 Full Year Result 23 February 2011 2010 Full Year Result 2010 Full

AFRICACRYPT 2010 Call for Papers AFRICACRYPT 2010 Call for Papers Africacrypt 2010 You are

2010 2010 .............................................. London 19 May 2010 London, 19 May 2010

Privacy Issues in Cloud computing Zeeshan Ali Shah System administrator PhD researcher KTH PDC

Infrastructure Technologies for Large- Scale Service-Oriented Systems Kostas Magoutis

Cloud Computing &amp; Scalability Reid Holmes REID HOLMES - CPSC 410: ADVANCED SOFTWARE

OVERVIEW 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2 Overview

From Development to Production: Many Uses of Serverless Observability EMRAH SAMDAN | SEPTEMBER

Announcements CS 4100: Artificial Intelligence Markov Decision Processes II Homework k 4:

Oracles in TTCN-3 and UTP Ina Schieferdecker 2012, May 22nd, CREST Workshop, London Outline

Estimating the Specific Indirect Effect for Multiple Types of Correspondence Audit DISCUSSED BY:

Cloud Computing & Scalability Reid Holmes REID HOLMES - CPSC 410: ADVANCED SOFTWARE