[PPT] - Evaluating the performance of skeleton-based high level parallel PowerPoint Presentation

SLIDE 1

Evaluating the performance of skeleton-based high level parallel programs

(funded by the EPSRC, grant number GR/S21717/01)

Enhancing the Performance Predictability of Grid Applications with Patterns and Process Algebras

A. Benoit, M. Cole, S. Gilmore, J. Hillston

http://www.inf.ed.ac.uk

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 1

SLIDE 2

Motivations

Grid technologies:

widely distributed collections of computers difficult issues of resource allocation and scheduling

Skeleton-based programming:

commonly used patterns models with Process Algebra

Enhance the performance of Grid applications

(performance results

“good” scheduling decisions)

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 2

SLIDE 3

Structure of the talk

Introduction The Pipeline skeleton

Principle of the skeleton Process Algebra Model

AMoGeT: The Automatic Model Generation Tool

Overview and input files Different functionalities

Numerical results Conclusions and Perspectives

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 3

SLIDE 4

Introduction - Grid and skeletons

Grid Applications

unpredictability of resource availability and performance scheduling issues rescheduling techniques may be useful

Skeleton based programming

library of skeletons many real applications can use these skeletons modularity, configurability Edinburgh Skeleton Library eSkel (MPI-based)

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 4

SLIDE 5

Introduction - Performance evaluation

Use of a particular skeleton:

information about implied scheduling dependencies

Model with stochastic process algebras PEPA

include aspects of uncertainty inherent to Grids automated modelling process dynamic monitoring of resource performance

allow better scheduling decisions, and adap- tive rescheduling of applications

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 5

SLIDE 6

Introduction - Related projects

The Network Weather Service – R. Wolski & al

benchmarking and monitoring techniques for the Grid no skeletons and no performance models

ICENI project – N. Furmento & al

performance models to improve the scheduling decisions no skeletons, models = graphs which approximate data

Use of skeleton programs within grid nodes – M. Alt&al

each server provides a function capturing the cost of its implementation of each skeleton each skeleton runs only on one server scheduling = select the most appropriate servers

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 6

SLIDE 7

Introduction - Main contribution

Single skeletons which span the Grid Skeletons modelled in a generic way using stochastic process algebras Performance results: allow a dynamic rescheduling of the Grid application to enhance its performance Significant results obtained on a first case study based on the Pipeline skeleton

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 7

SLIDE 8

Structure of the talk

Introduction The Pipeline skeleton

Principle of the skeleton Process Algebra Model

AMoGeT: The Automatic Model Generation Tool

Overview and input files Different functionalities

Numerical results Conclusions and Perspectives

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 8

SLIDE 9

Pipeline - Principle of the skeleton

...

Stage

✁

Stage

✂

Stage

✄✆☎

inputs

utputs

✝✟✞

stages process a sequence of inputs to produce a sequence of outputs All input passes through each stage in the same order The internal activity of a stage may be parallel, but this is transparent to our model

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 9

SLIDE 10

Pipeline - Model

Model expressed in Performance Evaluation Process Algebra PEPA [Hillston] Mapping of the application onto the computing resources: the network and the processors

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 10

SLIDE 11

Pipeline - Application model

Application model: independent of the resources 1 PEPA component per stage of the pipeline (

✞ ✠ ✡☞☛ ☛ ✌ ☎

)

✍ ✎✑✏ ✒✓ ✞

def

✔ ✕✗✖ ✘✙ ✓ ✞ ✚ ✛ ✜ ✢ ✕✗✣✤ ✘✥ ✓✦ ✦ ✞ ✚ ✛ ✜ ✢ ✕ ✖ ✘✙ ✓ ✞ ✧ ✡ ✚ ✛ ✜ ✢ ✍ ✎ ✏ ✒ ✓ ✞

Sequential component: gets data (

✖ ✘ ✙ ✓ ✞

), processes it (

✣✤ ✘ ✥ ✓ ✦ ✦ ✞

), moves the data to the next stage (

✖ ✘ ✙ ✓ ✞ ✧ ✡

) Unspecified rates (

✛

): depends on the resources Pipeline application = cooperation of the stages

★ ✩ ✣ ✓ ✪ ✩✬✫ ✓

def

✔ ✍ ✎ ✏ ✒ ✓ ✡ ✭ ✮✯✰✱ ✲ ✳ ✍ ✎✑✏ ✒ ✓✵✴ ✭ ✮ ✯ ✰✱ ✶ ✳ ✢ ✢ ✢ ✭ ✮✯✰✱ ✷ ☎ ✳ ✍ ✎ ✏ ✒ ✓ ✌ ☎ ✸✹✺ ✻ ✡

: arrival of an input in the application

✸✹✺ ✻ ✌ ☎ ✧ ✡

: transfer of the final output out of the Pipeline

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 11

SLIDE 12

Pipeline - Network model

Network model: information about the efficiency of the link connection between pairs of processors Assign rates

✼ ✞

to the

✖ ✘✙ ✓ ✞

activities (

✦ ✔ ✽ ✢ ✢ ✝ ✞ ✾ ✽

)

✝ ✓ ✎✑✿ ✘ ✤ ❀

def

✔ ✕✗✖ ✘ ✙ ✓ ✡ ✚ ✼ ✡ ✜ ✢ ✝❁✓ ✎ ✿ ✘ ✤ ❀ ✾ ✢ ✢ ✢ ✾ ✕ ✖ ✘✙ ✓ ✌ ☎ ✧ ✡ ✚ ✼ ✌ ☎ ✧ ✡ ✜ ✢ ✝❁✓ ✎ ✿ ✘ ✤ ❀ ❂ ✞

represents the connection between the processor

❃ ✞❅❄ ✡

hosting stage

❆❈❇ ❉

and the processor

❃ ✞

hosting stage

❆

Special cases:

❃❋❊

is the processor providing inputs to the Pipeline

❃ ✌ ☎ ✧ ✡

is where we want the outputs to be delivered

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 12

SLIDE 13

Pipeline - Processors model

Processors model: Application mapped on a set of

✝❍●

processors Rate

■ ✞

f the

✣ ✤ ✘ ✥ ✓✦ ✦ ✞

activities (

✦ ✔ ✽ ✢ ✢ ✝ ✞

): load of the processor, and other performance information One stage per processor (

✝

✔

✝ ✞ ❏ ✦ ✔ ✽ ✢ ✢ ✝ ✞

):

★ ✤ ✘ ✥ ✞

def

✔ ✕✗✣✤ ✘ ✥ ✓✦ ✦ ✞ ✚ ■ ✞ ✜ ✢ ★ ✤ ✘✥ ✞

Several stages per processor:

★ ✤ ✘ ✥ ✡

def

✔ ✕✗✣✤ ✘ ✥ ✓✦ ✦ ✡ ✚ ■ ✡ ✜ ✢ ★ ✤ ✘✥ ✡ ✾ ✕ ✣ ✤ ✘✥ ✓✦ ✦ ✴ ✚ ■ ✴ ✜ ✢ ★ ✤ ✘ ✥ ✡

Set of processors: parallel composition

★ ✤ ✘ ✥ ✓✦ ✦ ✘ ✤ ✦

def

✔ ★ ✤ ✘✥ ✡ ❑ ❑ ★ ✤ ✘ ✥ ✴ ❑ ❑ ✢ ✢ ✢ ❑ ❑ ★ ✤ ✘✥ ✌▼▲

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 13

SLIDE 14

Pipeline - Overall model

The overall model is the mapping of the stages onto the processors and the network by using the cooperation combinator

◆ ❖ P ◗❙❘ ❚❯ ❱❲❳ ❳ ❨ ❩ ❨ ❬ ❭❫❪ ❪ ❴❛❵

synchronize

❜ ❘ ❲ ❝ ❜❡❞ ❲

and

❚❯ ❱❲❳ ❳ ❯ ❚ ❳ ◆ ❢ P ◗❙❣ ❯❤ ❲ ❨ ❩ ❨ ❬ ❭ ❪ ❪ ❴ ❵ ✐ ❭

synchronize

❜ ❘ ❲ ❝ ❜ ❞ ❲

and

❲ ❥✬❦ ❯ ❚ ❧ ♠ ❘ ❘ ❜ ❞ ♥

def

P ❲ ❥ ❦ ❯ ❚ ❧ ♦ ♣ ❜ ❘ ❲ ❝ ❜❡❞ ❲ ♦ q ❚❯ ❱❲❳ ❳ ❯ ❚ ❳

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 14

SLIDE 15

Structure of the talk

Introduction The Pipeline skeleton

Principle of the skeleton Process Algebra Model

AMoGeT: The Automatic Model Generation Tool

Overview and input files Different functionalities

Numerical results Conclusions and Perspectives

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 15

SLIDE 16

AMoGeT - Overview

PEPA models models Solve results performance information AMoGeT Compare results description files information from NWS models Generate

AMoGeT: Automatic Model Generation Tool Standalone prototype Ultimate role: integrated component of a run-time scheduler and re-scheduler

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 16

SLIDE 17

AMoGeT - Description files (1)

Specify the names of the processors

file hosts.txt: list of IP addresses rank

✩

in the list

processor

✩

processor

✽

is the reference processor wellogy.inf.ed.ac.uk bw240n01.inf.ed.ac.uk bw240n02.inf.ed.ac.uk france.imag.fr

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 17

SLIDE 18

AMoGeT - Description files (2)

Describe the modelled application mymodel

file mymodel.des stages of the Pipeline: number of stages

✝ ✞

and time

✎ ✤ ✞

(sec) required to compute one output for each stage

✦ ✔ ✽ ✢ ✢ ✝❍✞

n the reference processor

nbstage=

✝❍✞

; tr1=10; tr2=2; ... mappings of stages to processors: location of the input data, the processor where each stage is processed, and where the output data must be left. mappings=[1,(1,2,3),1],[1,(1,1,1),1];

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 18

SLIDE 19

AMoGeT - Using the Network Weather Service

The Network Weather Service (NWS)

Wolsky & al: dynamic forecast of the performance

f network and computational resources

Just a few scripts to run on the monitored nodes Information we use:

✏ ✙✵r

fraction of CPU available to a newly-started

process on the processor

✩ ✪ ✏ r☞s t

latency (in ms) of a communication from

processor

✩

to processor

✉ ✥ ✣✈ r

frequency of the processor

✩

in MHz (/proc/cpuinfo)

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 19

SLIDE 20

AMoGeT - Generating the models

One Pipeline model per mapping Problem: computing the rates

Stage

✦

(

✦ ✔ ✽ ✢ ✢ ✝ ✞

) hosted on processor

✉

(and a total of

✫ ✇ t

stages hosted on this processor):

■ ✞ ✔ ✏ ✙ t ✫ ✇ t ① ✥ ✣✈ t ✥ ✣ ✈ ✡ ① ✽ ✎ ✤ ✞

Rate

✼ ✞

(

✦ ✔ ✽ ✢ ✢ ✝✟✞ ✾ ✽

): connection link between the processor

✉ ✞ ❄ ✡

hosting stage

✦ ② ✽

and the processor

✉ ✞

hosting stage

✦

:

✼ ✞ ✔ ✽③ ④ ⑤ ✪ ✏ t ☎ ⑥ ⑦ s t ☎

(special cases: stage

⑧

= input and stage

❴ ❵ ✐ ❭

= output)

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 20

SLIDE 21

AMoGeT - Solving the models

Numerical results obtained with the PEPA Workbench

Performance result: throughput of the

❣ ❯❤ ❲ ❨

activities = throughput of the application

identical for all

✦

(Pipeline) compute the throughput of

✖ ✘ ✙ ✓ ✡

expression of the result in the model: T=

✼ ✡

*

⑨

||(Stage1||)||(||)

⑩

||: separator between model components **: wild cards result obtained via a simple command line

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 21

SLIDE 22

AMoGeT - Comparing the results

All the results saved in a single file Which mapping produces the best throughput? Use this mapping to run the application

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 22

SLIDE 23

Structure of the talk

Introduction The Pipeline skeleton

Principle of the skeleton Process Algebra Model

AMoGeT: The Automatic Model Generation Tool

Overview and input files Different functionalities

Numerical results Conclusions and Perspectives

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 23

SLIDE 24

Numerical Results

Example: 3 Pipeline stages, up to 3 processors 27 states, 51 transitions

less than 1 second to solve (similarly with up to

❶

stages)

Parameters:

✪ ✏ t s t ✔ ③ ✢ ✽ ✖ ✦

for

✉ ✔ ✽ ✢ ✢ ❷

and

✪ ✏ t ⑦ s t ✲ ✔ ✪ ✏ t ✲ s t ⑦ ✥ ✣✈ t

identical for all processors

✉ ✔ ✽ ✢ ✢ ❷ ✎ ✤ ✞ ✔ ✎ ✤

identical for all stages

✦ ✔ ✽ ✢ ✢ ❷

For

✉ ✔ ✽ ✢ ✢ ❷

,

✎ t ✔ ✎ ✤ ⑤ ✏ ✙ t

, and so

■ ✞ ✔ ✽ ✫ ✇ t ① ✽ ✎ t

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 24

SLIDE 25

Numerical Results

No need to transfer the input or the output data Mappings compared: all the mappings with the first stage on the first processor (mappings

❸ ✽ ✚ ✕ ✽ ✚ ❹ ✚ ❹ ✜ ✚ ❹ ❺

) Experiment 1: Processors identical and fast network links (

✪ ✏ r☞s t ✔ ③ ✢ ✽

for all pairs of processors

✩ ✚ ✉

)

✎ t ✔ ③ ✢ ✽

(for all processors): optimal mappings (1,2,3) and (1,3,2) with a throughput of 5.64

✎ t ✔ ③ ✢ ❻

: same optimal mappings (one stage on each processor), but throughput divided by 2 (2.82)

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 25

SLIDE 26

Numerical Results

Experiment 2: Processor 3 slower than the others (

✎ ✡ ✔ ✎ ✴ ✔ ③ ✢ ✽

, and

✎ ④ ✔ ✽

) - Identical network links

✪ ✏ ✡ s ✴ ✔ ✪ ✏ ✴ s ④ ✔ ✪ ✏ ✡ s ④ ✔ ③ ✢ ✽

: (1,2,1) - 3.37

✪ ✏ ✡ s ✴ ✔ ✪ ✏ ✴ s ④ ✔ ✪ ✏ ✡ s ④ ✔ ✽③ ③

: (1,1,2) and (1,2,2) - 2.60

✪ ✏ ✡ s ✴ ✔ ✪ ✏ ✴ s ④ ✔ ✪ ✏ ✡ s ④ ✔ ✽③ ③ ③

: (1,1,1) - 1.88

avoid the use of processor 3, and avoid data transfer

when the network links become busy

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 26

SLIDE 27

Numerical Results

Experiment 3: The network connection to processor 3 is slow (

✪ ✏ ✡ s ✴ ✔ ✽③ ③ ❏ ✪ ✏ ✡ s ④ ✔ ✪ ✏ ✴ s ④ ✔ ✽③ ③ ③

)

✎ ✡ ✔ ✎ ✴ ✔ ✎ ④ ✔ ③ ✢ ✽

: (1,1,2) and (1,2,2) - 2.60

✎ ✡ ✔ ✎ ✴ ✔ ✽ ❏ ✎ ④ ✔ ③ ✢ ✽

: (1,3,3) - 0.49

avoid the use of processor 3 when all processors are

identical

use processor 3 only when the other ones are slower

(even if more time will be spent in communications)

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 27

SLIDE 28

Structure of the talk

Introduction The Pipeline skeleton

Principle of the skeleton Process Algebra Model

AMoGeT: The Automatic Model Generation Tool

Overview and input files Different functionalities

Numerical results Conclusions and Perspectives

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 28

SLIDE 29

Conclusions

Use of skeletons and performance models to improve the performance of Grid applications

Pipeline skeleton Tool AMoGeT which automates all the steps to

btain the result easily

Models: help us to choose the mapping to produce the best throughput of the application Use of the Network Weather Service to obtain realistic models

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 29

SLIDE 30

Perspectives

Provide more detailed timing information on the tool to prove its usefulness Extension to other skeletons Experiments with a realistic application on an heterogeneous computational Grid

First case study

we have the potential to enhance the

performance of Grid applications with the use of skeletons and process algebras

abenoit1@inf.ed.ac.uk – Enhance Meeting – 18 February 2004 – 30