Predicting Intermediate Storage Performance for Workflow - PowerPoint PPT Presentation

Predicting Intermediate Storage Performance for Workflow Applications Lauro B. Costa , Samer Al-Kiswany, Abmar Barros*, Hao Yang, and Matei Ripeanu University of British Columbia *UFCG, Brazil PSDW’13 Nov, 18th co-located with SC’13

Storage System Compute Nodes Backend Storage (e.g., NFS, GPFS) High Aggregated BW Storage system co-deployed One or few Avoid backend storage as bottleneck servers Many Nodes Opportunity to configure per application 2

Storage System Configuration Different storage parameters e.g., data placement, #nodes, chunk size Benefit different workloads e.g., data sharing, I/O intensive, read/write size Proper choice of parameters depend on the workload 3

BLAST Example 4

How to support the intermediate storage configuration? 5

Configuration Loop Identify parameters Define a target performance Costly Analyze system activity Run application 6

Automating the Configuration Loop Application Automated Configuration Trace Evaluation What...If... What...If... Execute Desired Engine Benchmark Configuration Platform description 7

Predictor Requirements Accuracy Response Time/Resource Usage Usability What...If... 8

Storage System Model Focus at high level – Manager, storage nodes, clients – No details (e.g., CPU) Simple seeding 9

Storage System Model 10

Seeding the Model No monitoring changes to the system – Use coarse level measurements – Infers services’ time Small deployment – One instance of each component 11

Evaluation Metrics – Accuracy – Response time Workload – Synthetic benchmark – An application Testbed: cluster of 20 machines 12

An Application BLAST DNA database file Several queries (tasks) over the file Evaluate different parameters # of storage nodes, # of clients chunk size 13

BLAST Results ~2x difference Performance varies Accuracy allows good decisions ~3000x less resources 14

Concluding Remarks Non-intrusive seeding process/system identification Low-runtime Accuracy allows good decision Predictor can support development 15

Future Work Automate parameter exploration – Prune space by preprocessing input – Induce placement based on task dependency Add applications Increase Scale Add metrics – Cost – Energy is challenging – Data transferred is accurate 16

Concluding Remarks Non-intrusive seeding process/system identification Low-runtime Accuracy allows good decision Predictor can support development 17

Workflow Applications DAG represents task- dependency Scheduler controls dependency and task execution on a cluster Tasks communicate via files 18

Synthetic Benchmarks Stress the system – I/O only, tend to create contention Based workflow patterns – Evaluate different data placements 19

Workflow Patterns 20

Synthetic Benchmarks Accuracy can support the decision Pipeline Reduce Broadcast ~2000x less resources 21

Related Work • Storage enclosure focused • Detailed model and seeding (monitoring changes) • Lack of prediction on the total execution time for workflow applications • Machine Learning 22

Workload Description I/O trace per task – read, write – size, offset Task dependency graph 23

BLAST: CPU hours 24

Platform Example – Argonne BlueGene/P 2.5K IO Nodes 160K cores GPFS IO rate : 8GBps = 51KBps / core Hi-Speed Network 10 Gb/s Switch Complex 24 servers Nodes dedicated to an application Storage system coupled with the application’s execution 850 MBps 2.5 GBps per 64 nodes per node 25

Tuning is Hard Defining target values can be hard Understanding distributed systems, application or application’s workloads is complex Workload or infrastructure can change Tuning is time-consuming 26

Storage System 27

Montage Example Tasks communicate via shared files 28

Storage System Meta-data manager Storage module Client module 29

Configuration Loop Identify Identify parameters parameters Define a target Define a target performance performance Costly Analyze system Analyze system activity activity Performance Run Predictor application 30

Intermediate Storage System Storage system co-deployed Avoid backend storage as bottleneck Opportunity to configure per application 31

Predicting Intermediate Storage Performance for Workflow - PowerPoint PPT Presentation

Predicting Intermediate Storage Performance for Workflow Applications Lauro B. Costa , Samer Al-Kiswany, Abmar Barros, Hao Yang, and Matei Ripeanu University of British Columbia UFCG, Brazil PSDW13 Nov, 18th co-located with SC13 Storage

Peoplesoft Workflow Peoplesoft Workflow Technology Technology Putting Customer First SOA IT

Intermediate forms: A-Normal Form Matt Might University of Utah www.ucombinator.org

STAR-CCM+ in your Workflow Bill Jester, CD-adapco STAR-CCM+ in your workflow Contents

Day 8 Workflow Cloud Resource Provisioning Todays Agenda Introduction What is workflow?

workflow: workflow: QSPR = Quantitative Structure Property

A Workflow Workflow for for Retrieving Retrieving Orthologous Orthologous A Promoters and I

Custer Baker Intermediate School Welcome to Custer Baker Intermediate School Intermediate

Generalized Intermediate Value Theorem Intermediate Value Theorem Theorem Intermediate Value

Welcome Predicting Change Outcomes Leveraging SQL Server Profiler Lee Everest SQL Rx Predicting

Design of a Petri Net-based Design of a Petri Net-based Workflow Engine Workflow Engine Simone

Introduction to CONNJUR Workflow Builder and Yes Workflow 2017 Summer Workshop: June 29, 2017

Kap. 12 Workflow Management in ERP-Systemen 12.1 Workflow Management: Konzepte 12.2 Einbindung

Module 4 - Smoothing the Workflow with the Kanban Best Practices Establishing an Even Workflow

Portfolio Specification, Constraints, and Objectives Intermediate Portfolio Analysis in R

Intermediate Capital Group PLC Half Year Results 30 September 2011 Intermediate Capital Group

Lecture Outline Intermediate Code & Intermediate code Local Optimizations Local

Lecture 15: Recursion (Sections 5.8-5.10) CS 1110 Introduction to Computing Using Python [E.

Natural Duality and Bitopology M. Andrew Moshier Chapman University August 2018 Moshier

Empirical Software Metrics for Benchmarking of Verification Tools Yulia Demyanova, Thomas Pani ,

Blast Noise Measurements and Community Response April 16, 2015 SERDP & ESTCP Webinar Series

Challenges in Delivering and Deploying Software at Scale in Large Clusters Douglas Thain and Kyle

Expression of Interest Letters for FY17 Audio is available only by conference call. Please call :

The Software Model Checker BLAST: Applications to Software Engineering Dirk Beyer, Thomas A.

SMT-based Software Model Checking: Experimental Comparison of Four Algorithms Matthias Dangl

Predicting Intermediate Storage Performance for Workflow - PowerPoint PPT Presentation

Predicting Intermediate Storage Performance for Workflow Applications Lauro B. Costa , Samer Al-Kiswany, Abmar Barros*, Hao Yang, and Matei Ripeanu University of British Columbia *UFCG, Brazil PSDW13 Nov, 18th co-located with SC13 Storage

Peoplesoft Workflow Peoplesoft Workflow Technology Technology Putting Customer First SOA IT

Intermediate forms: A-Normal Form Matt Might University of Utah www.ucombinator.org

STAR-CCM+ in your Workflow Bill Jester, CD-adapco STAR-CCM+ in your workflow Contents

Day 8 Workflow Cloud Resource Provisioning Todays Agenda Introduction What is workflow?

workflow: workflow: QSPR = Quantitative Structure Property

A Workflow Workflow for for Retrieving Retrieving Orthologous Orthologous A Promoters and I

Custer Baker Intermediate School Welcome to Custer Baker Intermediate School Intermediate

Generalized Intermediate Value Theorem Intermediate Value Theorem Theorem Intermediate Value

Welcome Predicting Change Outcomes Leveraging SQL Server Profiler Lee Everest SQL Rx Predicting

Design of a Petri Net-based Design of a Petri Net-based Workflow Engine Workflow Engine Simone

Introduction to CONNJUR Workflow Builder and Yes Workflow 2017 Summer Workshop: June 29, 2017

Kap. 12 Workflow Management in ERP-Systemen 12.1 Workflow Management: Konzepte 12.2 Einbindung

Module 4 - Smoothing the Workflow with the Kanban Best Practices Establishing an Even Workflow

Portfolio Specification, Constraints, and Objectives Intermediate Portfolio Analysis in R

Intermediate Capital Group PLC Half Year Results 30 September 2011 Intermediate Capital Group

Lecture Outline Intermediate Code &amp; Intermediate code Local Optimizations Local

Lecture 15: Recursion (Sections 5.8-5.10) CS 1110 Introduction to Computing Using Python [E.

Natural Duality and Bitopology M. Andrew Moshier Chapman University August 2018 Moshier

Empirical Software Metrics for Benchmarking of Verification Tools Yulia Demyanova, Thomas Pani ,

Blast Noise Measurements and Community Response April 16, 2015 SERDP &amp; ESTCP Webinar Series

Challenges in Delivering and Deploying Software at Scale in Large Clusters Douglas Thain and Kyle

Expression of Interest Letters for FY17 Audio is available only by conference call. Please call :

The Software Model Checker BLAST: Applications to Software Engineering Dirk Beyer, Thomas A.

SMT-based Software Model Checking: Experimental Comparison of Four Algorithms Matthias Dangl

Predicting Intermediate Storage Performance for Workflow Applications Lauro B. Costa , Samer Al-Kiswany, Abmar Barros, Hao Yang, and Matei Ripeanu University of British Columbia UFCG, Brazil PSDW13 Nov, 18th co-located with SC13 Storage

Lecture Outline Intermediate Code & Intermediate code Local Optimizations Local

Blast Noise Measurements and Community Response April 16, 2015 SERDP & ESTCP Webinar Series