Starting Workflow Tasks Before Theyre Ready Wladislaw Gusew, Bj - PowerPoint PPT Presentation

Starting Workflow Tasks Before They’re Ready Wladislaw Gusew, Bj¨ orn Scheuermann Computer Engineering Group, Humboldt University of Berlin

Agenda ◮ Introduction ◮ Execution semantics ◮ Methods and tools ◮ Simulation results ◮ Experimental results ◮ Conclusion 1 / 21

Big data in research 2 / 21

Scientific workflow example ◮ Directed Acyclic Graph (DAG) ◮ Executed on distributed systems ◮ Aggregation and broadcast types of tasks ◮ Demanding for network resources 3 / 21

Execution semantics 4 / 21

Execution semantics ◮ But in reality resources are limited ◮ Execute only a subset of parent tasks concurrently (insufficient number of workers) ◮ Congestion of network (all parent tasks have the same priority) 4 / 21

Example execution 5 / 21

Example execution ◮ Network congestion can slow down processing even further (effects of data losses at the transport protocol layer) ◮ High delay to the start of the aggregation task ◮ Low performance and high execution costs (e.g., in computation clouds) 5 / 21

What can we do to improve this? 6 / 21

What can we do to improve this? List of actions: 1. Obtain information on task’s input characteristics 2. Refine the workflow and inform the execution engine 3. Let the aggregation task ”feel comfortable” in changed setting 6 / 21

Obtaining input characteristics 1. Annotations to workflows 2. Manual code review 3. Automated profiling 7 / 21

Automated profiling ◮ Operating system instrumentation tool ◮ Enables interception of system calls (file open, read/write, file close) ◮ Record and evaluate logfiles with traces of conducted file accesses. 8 / 21

Automated profiling ◮ Operating system instrumentation tool ◮ Enables interception of system calls (file open, read/write, file close) ◮ Record and evaluate logfiles with traces of conducted file accesses. Reads by mAdd in a small workflow Reads by mAdd in a medium sized workflow 3 4.5 4 2.5 Read accesses [MB] Read accesses [MB] 3.5 2 3 2.5 1.5 2 1 1.5 1 0.5 0.5 0 0 0 0.5 1 1.5 2 2.5 3 0 2 4 6 8 10 12 14 16 18 Execution progress [10 8 CPU cycles] Execution progress [10 8 CPU cycles] 8 / 21

Refining workflow by transforming DAG 9 / 21

Realizing virtual task split ◮ Real task is transparently wrapped ◮ FUSE enables the setup of a virtual File system in USEr space ◮ Access to input files is performed through our wrapper ◮ Wrapper is responsible for maintaining the correct execution logic 10 / 21

Evaluation with the Montage workflow 11 / 21

Simulating workflow execution ◮ Java-based simulation framework for scientific workflows ◮ Simulates an execution on a Pegasus/HTCondor stack ◮ Use provided Montage workflows with 25, 50, 100, 1000 tasks ◮ Python script conducted DAG transformation of DAX files ◮ Network configured as bottleneck (by bandwidth limitation) W. Chen and E. Deelman, ”WorkflowSim: A toolkit for simulating scientific workflows in distributed environments,” in eScience’12. 12 / 21

Simulation results 13 / 21

Variation of number of tasks Simulation results for 50 workers and max-min Normal Split Total workflow runtime (log.) [s] 31% 1000 25% 19% 15% 100 10 1 25 50 100 1000 Number of tasks 14 / 21

Variation of workers 15 / 21

Variation of workers Simulation results for Montage 100 and min-min 450 Normal Split 400 10% Total workflow runtime [s] 350 300 14% 250 200 26% 25% 150 100 5 10 50 100 Number of workers 16 / 21

Variation of scheduling algorithms 17 / 21

Variation of scheduling algorithms Simulation results for Montage 100 on 100 workers 350 Normal Split Total workflow runtime [s] 300 17% 34% 250 200 25% 25% 27% 28% 150 100 50 0 M M R H D R o a i a E H n u F n - x E m - n T d m d F o T i n i - m n r o b i n Scheduling algorithm 18 / 21

Evaluation in a computing cluster ◮ Small cluster of up to 10 compute nodes ◮ Intel i7 CPU@ 2.5GHz, 8GB RAM, connected to common network switch with 1Gbit/s ◮ Execute Montage 133 workflow in Pegasus/HTCondor ◮ Network bandwidth was limited on application layer to 10Mbit/s ◮ 10 repetitions, mean values with 95% confidence intervals 19 / 21

Measurement results Computing cluster results for 1...10 workers 200 Original Montage 133 180 Transformed Montage 133 160 Total workflow runtime [s] 140 120 100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 Number of computing nodes 20 / 21

Conclusion ◮ Many ”legacy” workflows exist which are executed with classic semantics ◮ Our approach is applicable to aggregation tasks that are often the most time intensive tasks in a workflow ◮ By using DAG transformation, no changes to task implementations and execution engines are required 21 / 21

Conclusion ◮ Many ”legacy” workflows exist which are executed with classic semantics ◮ Our approach is applicable to aggregation tasks that are often the most time intensive tasks in a workflow ◮ By using DAG transformation, no changes to task implementations and execution engines are required ◮ Simulation and real experiment show that performance can be improved by up to 15% ◮ Potential of outperforming the original workflow grows with increasing #workers and #tasks 21 / 21

Starting Workflow Tasks Before Theyre Ready Wladislaw Gusew, Bj - PowerPoint PPT Presentation

Starting Workflow Tasks Before Theyre Ready Wladislaw Gusew, Bj orn Scheuermann Computer Engineering Group, Humboldt University of Berlin Agenda Introduction Execution semantics Methods and tools Simulation results

Bobcat Ready Bobcat Ready: Overview College Ready Indicators

Peoplesoft Workflow Peoplesoft Workflow Technology Technology Putting Customer First SOA IT

STAR-CCM+ in your Workflow Bill Jester, CD-adapco STAR-CCM+ in your workflow Contents

GLO Science Professional Before & After Images Before GLO After GLO Before GLO After GLO

Day 8 Workflow Cloud Resource Provisioning Todays Agenda Introduction What is workflow?

workflow: workflow: QSPR = Quantitative Structure Property

A Workflow Workflow for for Retrieving Retrieving Orthologous Orthologous A Promoters and I

Shared Memory Programming with OpenMP Lecture 6: Tasks What are tasks? Tasks are

Scheduling Aperiodic Tasks Background Scheduling Treat aperiodic tasks as lowest-priority

Design of a Petri Net-based Design of a Petri Net-based Workflow Engine Workflow Engine Simone

Introduction to CONNJUR Workflow Builder and Yes Workflow 2017 Summer Workshop: June 29, 2017

Kap. 12 Workflow Management in ERP-Systemen 12.1 Workflow Management: Konzepte 12.2 Einbindung

Module 4 - Smoothing the Workflow with the Kanban Best Practices Establishing an Even Workflow

RECENT PROGRESS ON WEB SERVICES FOR SFT Nefeli Kousi TASKS TASKS ROOT Primer to Notebooks

Time Management Beth Asbury Outline Time Bandits Scheduling tasks Prioritising tasks

Slide 1 Page: 1 Mathematical Tasks.ppt Effective Mathematics Instruction: The Role of

Getting a Grip on Tasks that Coordinate Tasks Rinus Plasmeijer Radboud University Nijmegen

DTTF/NB479: Dszquphsbqiz Day 23 Announcements: Term project groups and topics due tomorrow 1.

Pull and Push Factors for International Students in Australia, Canada and the United Kingdom

Job Analysis and Specialized Experience Tisa Tolliver Lead HR Specialist Acting, DEU Branch

WorkflowVirtualiza/onforData IntensiveComputa/on (WVDIC)

LUIGI & KUBERNETES EuroPython 2019, Basel Nar Kumar Chhantyal v Data Lake @ Breuninger.com v

IEEE MSST 2016 Workflows A Procurement/Spec Tool Session Gary Grider Division Leader High

Scalable Observation System (SOS) for Scientific Workflows Pr Project Ov oject Over erview

Starting Workflow Tasks Before Theyre Ready Wladislaw Gusew, Bj - PowerPoint PPT Presentation

Starting Workflow Tasks Before Theyre Ready Wladislaw Gusew, Bj orn Scheuermann Computer Engineering Group, Humboldt University of Berlin Agenda Introduction Execution semantics Methods and tools Simulation results

Bobcat Ready Bobcat Ready: Overview College Ready Indicators

Peoplesoft Workflow Peoplesoft Workflow Technology Technology Putting Customer First SOA IT

STAR-CCM+ in your Workflow Bill Jester, CD-adapco STAR-CCM+ in your workflow Contents

GLO Science Professional Before &amp; After Images Before GLO After GLO Before GLO After GLO

Day 8 Workflow Cloud Resource Provisioning Todays Agenda Introduction What is workflow?

workflow: workflow: QSPR = Quantitative Structure Property

A Workflow Workflow for for Retrieving Retrieving Orthologous Orthologous A Promoters and I

Shared Memory Programming with OpenMP Lecture 6: Tasks What are tasks? Tasks are

Scheduling Aperiodic Tasks Background Scheduling Treat aperiodic tasks as lowest-priority

Design of a Petri Net-based Design of a Petri Net-based Workflow Engine Workflow Engine Simone

Introduction to CONNJUR Workflow Builder and Yes Workflow 2017 Summer Workshop: June 29, 2017

Kap. 12 Workflow Management in ERP-Systemen 12.1 Workflow Management: Konzepte 12.2 Einbindung

Module 4 - Smoothing the Workflow with the Kanban Best Practices Establishing an Even Workflow

RECENT PROGRESS ON WEB SERVICES FOR SFT Nefeli Kousi TASKS TASKS ROOT Primer to Notebooks

Time Management Beth Asbury Outline Time Bandits Scheduling tasks Prioritising tasks

Slide 1 Page: 1 Mathematical Tasks.ppt Effective Mathematics Instruction: The Role of

Getting a Grip on Tasks that Coordinate Tasks Rinus Plasmeijer Radboud University Nijmegen

DTTF/NB479: Dszquphsbqiz Day 23 Announcements: Term project groups and topics due tomorrow 1.

Pull and Push Factors for International Students in Australia, Canada and the United Kingdom

Job Analysis and Specialized Experience Tisa Tolliver Lead HR Specialist Acting, DEU Branch

WorkflowVirtualiza/onforData IntensiveComputa/on (WVDIC)

LUIGI &amp; KUBERNETES EuroPython 2019, Basel Nar Kumar Chhantyal v Data Lake @ Breuninger.com v

IEEE MSST 2016 Workflows A Procurement/Spec Tool Session Gary Grider Division Leader High

Scalable Observation System (SOS) for Scientific Workflows Pr Project Ov oject Over erview

GLO Science Professional Before & After Images Before GLO After GLO Before GLO After GLO

LUIGI & KUBERNETES EuroPython 2019, Basel Nar Kumar Chhantyal v Data Lake @ Breuninger.com v