First Steps Towards Automatically Building Network Representations - - PowerPoint PPT Presentation

first steps towards automatically building network
SMART_READER_LITE
LIVE PREVIEW

First Steps Towards Automatically Building Network Representations - - PowerPoint PPT Presentation

First Steps Towards Automatically Building Network Representations Lionel Eyraud-Dubois ENS-Lyon, France. Arnaud Legrand CNRS, Grenoble, France. Martin Quinson Nancy University, France. Fr ed eric Vivien INRIA, Lyon, France.


slide-1
SLIDE 1

First Steps Towards Automatically Building Network Representations

Lionel Eyraud-Dubois

´ ENS-Lyon, France.

Arnaud Legrand

CNRS, Grenoble, France.

Martin Quinson

Nancy University, France.

Fr´ ed´ eric Vivien

INRIA, Lyon, France.

Euro-Par’07

Rennes, August 2007

slide-2
SLIDE 2

Scheduling on a large-scale distributed platform

◮ Let GP = (VP, EP) denote the platform graph

1 1 1 1 10 P1 P2 P4 P3

◮ Each edge Pi → Pj is labeled by ci,j:

time necessary to send a unit-size message between Pi and Pj

◮ Communication model:

◮ full-overlap of communications and computations ◮ 1-port for incoming communications and 1-port for outgoing communications

◮ Each node Pi has a processing speed wi ∈ R

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Introduction 2/22

slide-3
SLIDE 3

Scheduling on a large-scale distributed platform

◮ Let GP = (VP, EP) denote the platform graph

1 1 1 1 10 P1 P2 P4 P3

◮ Each edge Pi → Pj is labeled by ci,j:

time necessary to send a unit-size message between Pi and Pj

◮ Communication model:

◮ full-overlap of communications and computations ◮ 1-port for incoming communications and 1-port for outgoing communications

◮ Each node Pi has a processing speed wi ∈ R

Eh wait! How did you get the graph?!

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Introduction 2/22

slide-4
SLIDE 4

Building a network representation

Motivation

◮ Modern platforms are heterogeneous and dynamic ◮ Distributed applications must be network-aware and reactive ◮ Information on the network needed (at least) for:

◮ Service and distributed application deployment ◮ Communication-aware scheduling ◮ Group communication ◮ Proximity Neighbor Selection in P2P systems

Several levels of information (depending on the OSI layer)

◮ Physical inter-connexion map (wires in the walls) ◮ Routing infrastructure (path of network packets, from router to switch) ◮ Application level (focus on effects – bandwidth & latency – not causes)

Network mapping process

◮ Step 1: (End-to-end) measurements ◮ Step 2: Reconstruct a graph

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Introduction 3/22

slide-5
SLIDE 5

Classical measurements in a grid environment?

Use of low-level network protocols (like SNMP or BGP)

◮ Example: Remos ◮ Use of SNMP restricted for security reasons (DoS or spying)

Use of traceroute or ping (i.e. on ICMP)

◮ Examples: TopoMon, Lumeta, IDmaps, Global Network Positioning ◮ Use of ICMP more and more restricted by admins (for security reasons)

Over the lifetime of the project, we have noticed that the number

  • f replying destinations in our lists decays at the rate of 2-3% per

month. – Authors of the Skitter project

Pathchar

◮ Works without privilege on the network, but must be root on hosts

⇒ not adapted to Grid settings

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations State of the art 4/22

slide-6
SLIDE 6

Classical measurements in a grid environment?

Use of low-level network protocols (like SNMP or BGP)

◮ Example: Remos ◮ Use of SNMP restricted for security reasons (DoS or spying)

Use of traceroute or ping (i.e. on ICMP)

◮ Examples: TopoMon, Lumeta, IDmaps, Global Network Positioning ◮ Use of ICMP more and more restricted by admins (for security reasons)

Over the lifetime of the project, we have noticed that the number

  • f replying destinations in our lists decays at the rate of 2-3% per

month. – Authors of the Skitter project

Pathchar

◮ Works without privilege on the network, but must be root on hosts

⇒ not adapted to Grid settings

Measurements must be at application-level (no privilege)

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations State of the art 4/22

slide-7
SLIDE 7

Solutions relying on application-level measurements

NWS (Network Weather Service – UCSB)

◮ Reports bandwidth, latency, CPU availability, and future trends ◮ Only quantitative values, no topological information

(but one can label a big clique with NWS-provided values)

ENV (Effective Network View – UCSD)

◮ Use interference measurements to build a tree representation

ECO (Efficient Collective Communication – CMU)

◮ Use application-level measurements to optimize collective communications ◮ Should be generalized

Existing reconstruction algorithms

◮ Cliques (NWS, ECO) or trees (ENV, Classical latency clustering)

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations State of the art 5/22

slide-8
SLIDE 8

Solutions relying on application-level measurements

NWS (Network Weather Service – UCSB)

◮ Reports bandwidth, latency, CPU availability, and future trends ◮ Only quantitative values, no topological information

(but one can label a big clique with NWS-provided values)

ENV (Effective Network View – UCSD)

◮ Use interference measurements to build a tree representation

ECO (Efficient Collective Communication – CMU)

◮ Use application-level measurements to optimize collective communications ◮ Should be generalized

Existing reconstruction algorithms

◮ Cliques (NWS, ECO) or trees (ENV, Classical latency clustering)

Our goal

◮ Assess quality of clique and spanning tree algorithms ◮ Propose original approaches

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations State of the art 5/22

slide-9
SLIDE 9

Outline

Introduction State of the art ALNeM goals and architecture Reconstruction algorithms Basic reconstruction algorithms Improved spanning tree Aggregation Experimental evaluation Renater platform GridG platforms Conclusion and perspectives

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations ALNeM goals and architecture 6/22

slide-10
SLIDE 10

ALNeM (Application-Level Network Mapper)

Presentation

◮ Long-term goal: be a tool providing topology to network-aware applications ◮ Short-term goal: allow the study of network mapping algorithms

?

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations ALNeM goals and architecture 7/22

slide-11
SLIDE 11

ALNeM (Application-Level Network Mapper)

Presentation

◮ Long-term goal: be a tool providing topology to network-aware applications ◮ Short-term goal: allow the study of network mapping algorithms DB S S S S S S S S

Architecture

◮ Lightweight distributed measurement infrastructure (collection of sensors) ◮ MySQL measurement database

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations ALNeM goals and architecture 7/22

slide-12
SLIDE 12

ALNeM (Application-Level Network Mapper)

Presentation

◮ Long-term goal: be a tool providing topology to network-aware applications ◮ Short-term goal: allow the study of network mapping algorithms DB S S S S S S S S

Architecture

◮ Lightweight distributed measurement infrastructure (collection of sensors) ◮ MySQL measurement database

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations ALNeM goals and architecture 7/22

slide-13
SLIDE 13

ALNeM (Application-Level Network Mapper)

Presentation

◮ Long-term goal: be a tool providing topology to network-aware applications ◮ Short-term goal: allow the study of network mapping algorithms

Algorithm 1

Right platform Wrong topology Wrong values

Algorithm 2 Algorithm 3

DB

Architecture

◮ Lightweight distributed measurement infrastructure (collection of sensors) ◮ MySQL measurement database ◮ Topology builder, with several reconstruction algorithms

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations ALNeM goals and architecture 7/22

slide-14
SLIDE 14

ALNeM (Application-Level Network Mapper)

Presentation

◮ Long-term goal: be a tool providing topology to network-aware applications ◮ Short-term goal: allow the study of network mapping algorithms

Algorithm 1

Right platform Wrong topology Wrong values

Algorithm 2 Algorithm 3

DB S S S S S S S S

Architecture

◮ Lightweight distributed measurement infrastructure (collection of sensors) ◮ MySQL measurement database ◮ Topology builder, with several reconstruction algorithms

Development on simulator, use in real life

◮ Implemented using GRAS (same code running in both contexts)

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations ALNeM goals and architecture 7/22

slide-15
SLIDE 15

Evaluation methodology

Goal: Quantify similarity between initial and reconstructed platforms. Not so easy!

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations ALNeM goals and architecture 8/22

slide-16
SLIDE 16

Evaluation methodology

Goal: Quantify similarity between initial and reconstructed platforms. Not so easy!

4 evaluation approaches

◮ Visual evaluation (structural comparison) ◮ Compare end-to-end measurements (communication-level) ◮ Compare interference amount:

Interf ((a, b) , (c, d)) = 1 iff BW (a → b) BW (a → b c → d) ≈ 2

◮ Compare application running times (application-level)

  • Comm. schema

// comm # steps Token-ring Ring No 1 Broadcast Tree No 1 All2All Clique Yes 1 Parallel Matrix Multiplication 2D Yes √procs

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations ALNeM goals and architecture 8/22

slide-17
SLIDE 17

Evaluation methodology

Goal: Quantify similarity between initial and reconstructed platforms. Not so easy!

4 evaluation approaches

◮ Visual evaluation (structural comparison) ◮ Compare end-to-end measurements (communication-level) ◮ Compare interference amount:

Interf ((a, b) , (c, d)) = 1 iff BW (a → b) BW (a → b c → d) ≈ 2

◮ Compare application running times (application-level)

  • Comm. schema

// comm # steps Token-ring Ring No 1 Broadcast Tree No 1 All2All Clique Yes 1 Parallel Matrix Multiplication 2D Yes √procs

Apply all approaches on several platforms

◮ In simulation: collect data on “real” platforms, compare reconstructed to initial ◮ In situ: most comparisons not applicable, so hard to assess quality, but still usable

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations ALNeM goals and architecture 8/22

slide-18
SLIDE 18

Outline

Introduction State of the art ALNeM goals and architecture Reconstruction algorithms Basic reconstruction algorithms Improved spanning tree Aggregation Experimental evaluation Renater platform GridG platforms Conclusion and perspectives

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 9/22

slide-19
SLIDE 19

Basic reconstruction algorithms

Clique

◮ Connect all pairs of nodes, label with measured values

Spanning trees

◮ use edges with lowest latency or highest bandwidth

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 10/22

slide-20
SLIDE 20

New heuristic: Improved spanning tree

Algorithm

◮ Add links to the spanning trees, to improve predictions ◮ Connect the closest badly predicted nodes (latency over-estimated)

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 11/22

slide-21
SLIDE 21

New heuristic: Improved spanning tree

Algorithm

◮ Add links to the spanning trees, to improve predictions ◮ Connect the closest badly predicted nodes (latency over-estimated)

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 11/22

slide-22
SLIDE 22

New heuristic: Improved spanning tree

Algorithm

◮ Add links to the spanning trees, to improve predictions ◮ Connect the closest badly predicted nodes (latency over-estimated) ◮ Update the routing if it improves the predictions

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 11/22

slide-23
SLIDE 23

New heuristic: Aggregation

Algorithm

◮ Grow a set of connected nodes ◮ For each node:

◮ Repeatedly add edges to predict better the latency ◮ Until all routes from this node to the set are satisfied

◮ Refrain from adding “redundant” edges

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 12/22

slide-24
SLIDE 24

New heuristic: Aggregation

Algorithm

◮ Grow a set of connected nodes ◮ For each node:

◮ Repeatedly add edges to predict better the latency ◮ Until all routes from this node to the set are satisfied

◮ Refrain from adding “redundant” edges

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 12/22

slide-25
SLIDE 25

New heuristic: Aggregation

Algorithm

◮ Grow a set of connected nodes ◮ For each node:

◮ Repeatedly add edges to predict better the latency ◮ Until all routes from this node to the set are satisfied

◮ Refrain from adding “redundant” edges

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 12/22

slide-26
SLIDE 26

New heuristic: Aggregation

Algorithm

◮ Grow a set of connected nodes ◮ For each node:

◮ Repeatedly add edges to predict better the latency ◮ Until all routes from this node to the set are satisfied

◮ Refrain from adding “redundant” edges

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 12/22

slide-27
SLIDE 27

New heuristic: Aggregation

Algorithm

◮ Grow a set of connected nodes ◮ For each node:

◮ Repeatedly add edges to predict better the latency ◮ Until all routes from this node to the set are satisfied

◮ Refrain from adding “redundant” edges

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 12/22

slide-28
SLIDE 28

New heuristic: Aggregation

Algorithm

◮ Grow a set of connected nodes ◮ For each node:

◮ Repeatedly add edges to predict better the latency ◮ Until all routes from this node to the set are satisfied

◮ Refrain from adding “redundant” edges

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 12/22

slide-29
SLIDE 29

New heuristic: Aggregation

Algorithm

◮ Grow a set of connected nodes ◮ For each node:

◮ Repeatedly add edges to predict better the latency ◮ Until all routes from this node to the set are satisfied

◮ Refrain from adding “redundant” edges

◮ long link (pred > 2 × meas) suspected of redundancy with forthcoming links Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 12/22

slide-30
SLIDE 30

New heuristic: Aggregation

Algorithm

◮ Grow a set of connected nodes ◮ For each node:

◮ Repeatedly add edges to predict better the latency ◮ Until all routes from this node to the set are satisfied

◮ Refrain from adding “redundant” edges

◮ long link (pred > 2 × meas) suspected of redundancy with forthcoming links Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 12/22

slide-31
SLIDE 31

New heuristic: Aggregation

Algorithm

◮ Grow a set of connected nodes ◮ For each node:

◮ Repeatedly add edges to predict better the latency ◮ Until all routes from this node to the set are satisfied

◮ Refrain from adding “redundant” edges

◮ long link (pred > 2 × meas) suspected of redundancy with forthcoming links ◮ Indeed, green link is redundant. Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 12/22

slide-32
SLIDE 32

New heuristic: Aggregation

Algorithm

◮ Grow a set of connected nodes ◮ For each node:

◮ Repeatedly add edges to predict better the latency ◮ Until all routes from this node to the set are satisfied

◮ Refrain from adding “redundant” edges

◮ long link (pred > 2 × meas) suspected of redundancy with forthcoming links ◮ Indeed, green link is redundant. Thus removed. Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 12/22

slide-33
SLIDE 33

Outline

Introduction State of the art ALNeM goals and architecture Reconstruction algorithms Basic reconstruction algorithms Improved spanning tree Aggregation Experimental evaluation Renater platform GridG platforms Conclusion and perspectives

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Experimental evaluation 13/22

slide-34
SLIDE 34

Experiments on simulator: Renater platform (1/4)

◮ Real platform built manually (real measurements + admin feedback)

Visual evaluation

lyon strasbourg belfort grenoble clermont bordeaux montpellier marseille nice rennes besancon

Real platform

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Experimental evaluation 14/22

slide-35
SLIDE 35

Experiments on simulator: Renater platform (1/4)

◮ Real platform built manually (real measurements + admin feedback)

Visual evaluation

lyon strasbourg belfort grenoble clermont bordeaux montpellier marseille nice rennes besancon

Real platform

lyon strasbourg belfort grenoble clermont bordeaux montpellier marseille nice rennes besancon

Latency spanning tree

lyon strasbourg belfort grenoble clermont bordeaux montpellier marseille nice rennes besancon

Bandwidth spanning tree

lyon strasbourg belfort grenoble clermont bordeaux montpellier marseille nice rennes besancon

Aggregate

lyon strasbourg belfort grenoble clermont bordeaux montpellier marseille nice rennes besancon

Latency improved spanning tree

lyon strasbourg belfort grenoble clermont bordeaux montpellier marseille nice rennes besancon

Bandwidth improved spanning tree

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Experimental evaluation 14/22

slide-36
SLIDE 36

Experiments on simulator: Renater platform (2/4)

End to end measurements

1.0 1.2 1.4

Accuracy BW Lat A g g r e g a t e C l i q u e I m p T r e e B W I m p T r e e L a t T r e e B W T r e e L a t

How to read: accuracy closer to 1 (lower bar) better result

Interpretation

◮ Clique: very good (of course) ◮ Spanning Trees: missing links give bad predictions ◮ Improvement procedure helps ◮ Aggregate performs badly for bandwidth predictions

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Experimental evaluation 15/22

slide-37
SLIDE 37

Experiments on simulator: Renater platform (3/4)

Interference measurements

500 1000 1500 2000 2500

# occurences Correct pred. False pos. False neg. # actual interf. C l i q u e T r e e B W T r e e L a t I m p T r e e B W I m p T r e e L a t A g g r e g a t e

◮ Interfinit(ab, cd) ?

= Interfrecons(ab, cd)

◮ Tree*

◮ Misses some links ◮ Most existing interferences right ◮ Lot of false positive

◮ Clique

◮ Predicts no interference at all ◮ No false positive ◮ Very few actual interferences

◮ Improvement reduces false positives

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Experimental evaluation 16/22

slide-38
SLIDE 38

Experiments on simulator: Renater platform (4/4)

Application-level measurements

1 2

Accuracy token broadcast all2all pmm A g g r e g a t e C l i q u e I m p T r e e B W I m p T r e e L a t T r e e B W T r e e L a t

◮ Token and broadcast: same conclusion than end2end

◮ Clique good, Trees bit worse, Improvements work

◮ All2all and pmm: completely new light

◮ Clique dramatically underestimates times:

No contention between parallel communication

◮ Tree* overestimate times: Missing links (as before) ◮ Improved algorithms have good predictive power Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Experimental evaluation 17/22

slide-39
SLIDE 39

Experiments on simulator: GridG platforms

◮ GridG is a synthetic platform generator [Lu, Dinda – SuperComputing03]

Generates realistic platforms

◮ Experiment: 40 platforms (60 hosts – default GridG parameters)

End to end measurements

1.2 1.4 1.6 1.8

Accuracy Bandwidth Latency A g g r e g a t e C l i q u e I m p T r e e B W I m p T r e e L a t T r e e B W T r e e L a t

Application-level measurements

1 2 4

Accuracy token broadcast all2all pmm Aggregate Clique ImpTreeBW ImpTreeLat TreeBW TreeLat

Interpretation

◮ Naive algorithms get bad results ◮ Improved trees yield good reconstructions

◮ ImpTreeBW error ≈ 3% for all2all (worst case) Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Experimental evaluation 18/22

slide-40
SLIDE 40

Contributions of ALNeM

Completed a framework for reconstruction algorithm evaluation

◮ Several criterion of similarity between initial and reconstructed platforms

visual (structural), end-to-end, interferences, application timings

◮ Allows comparison of reconstruction algorithms from application POV ◮ Runs on simulator or in-situ thanks to GRAS (& SimGrid)

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Conclusion and perspectives 19/22

slide-41
SLIDE 41

Contributions of ALNeM

Completed a framework for reconstruction algorithm evaluation

◮ Several criterion of similarity between initial and reconstructed platforms

visual (structural), end-to-end, interferences, application timings

◮ Allows comparison of reconstruction algorithms from application POV ◮ Runs on simulator or in-situ thanks to GRAS (& SimGrid)

Analyzed classical algorithms and proposed original ones

◮ Evaluated algorithms: Clique, Bandwidth or Latency Spanning Tree ◮ Evaluation conditions: Simulator, both real and synthetic platforms

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Conclusion and perspectives 19/22

slide-42
SLIDE 42

Contributions of ALNeM

Completed a framework for reconstruction algorithm evaluation

◮ Several criterion of similarity between initial and reconstructed platforms

visual (structural), end-to-end, interferences, application timings

◮ Allows comparison of reconstruction algorithms from application POV ◮ Runs on simulator or in-situ thanks to GRAS (& SimGrid)

Analyzed classical algorithms and proposed original ones

◮ Evaluated algorithms: Clique, Bandwidth or Latency Spanning Tree ◮ Evaluation conditions: Simulator, both real and synthetic platforms

Conclusion

◮ Classical algorithms are not satisfactory

◮ Spanning trees: miss edges, leading to performance under-estimation ◮ Clique: do not capture any existing interference

◮ Improving spanning trees yield much better results

◮ Especially ImpTreeBW: uses both kind of measurements Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Conclusion and perspectives 19/22

slide-43
SLIDE 43

Adding routers to the picture

◮ New set of experiments: only leaf nodes run the measurement processes

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Conclusion and perspectives 20/22

slide-44
SLIDE 44

Adding routers to the picture

◮ New set of experiments: only leaf nodes run the measurement processes

End to end measurements

1 2 4

Accuracy BW Lat A g g r e g a t e C l i q u e I m p T r e e B W I m p T r e e L a t T r e e B W T r e e L a t

Application-level measurements

1 2 4

Accuracy token broadcast all2all pmm Aggregate Clique ImpTreeBW ImpTreeLat TreeBW TreeLat

Interpretation

◮ None of the proposed heuristic is satisfactory ◮ Future work: improve this!

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Conclusion and perspectives 20/22

slide-45
SLIDE 45

Future works on the ALNeM project

Better reconstruction algorithms

◮ Mix bandwidth and latency values ◮ Build graphs with internal (hidden) nodes

Other measurements from the sensors (new inputs to algorithms)

◮ Interference (but very expensive to acquire) ◮ Packet gap and back-to-back packets ◮ Packet loss, etc.

Method based on successive refinements

  • 1. Spanning tree as first approximation
  • 2. Refinement by adding some missing links
  • 3. Some (not all) interference measurements to double-check the result

Far future:

◮ Adapt to condition changes (bandwidth variation, node arrival/departure) ◮ Distribute the tool to end-users

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Conclusion and perspectives 21/22

slide-46
SLIDE 46

Thank your for your attention Any question?

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Conclusion and perspectives 22/22

slide-47
SLIDE 47

Experiments on a real platform: Grid’5000

The Grid’5000 platform

◮ Test-bed for Grid researchers ◮ 9 sites in France, targets 5000 cores (2500 currently)

Graphical evaluation

gdx bordeaux toulouse capricorne sagittaire icluster2 helios azur lille parasol paravent

Real platform

gdx parasol paravent lille icluster2 helios azur sagittaire capricorne toulouse bordeaux

Latency spanning tree

gdx paravent parasol lille sagittaire capricorne icluster2 helios azur toulouse bordeaux

Bandwidth spanning tree

◮ Some links are missing, of course ◮ Bandwidth induced graph better, but maybe G5K more focused on bandwidth

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations 22/22

slide-48
SLIDE 48

Experiments on a real platform: Grid’5000

gdx bordeaux toulouse capricorne sagittaire icluster2 helios azur lille parasol paravent

Real platform

gdx parasol paravent lille icluster2 helios azur sagittaire capricorne toulouse bordeaux

Latency spanning tree

gdx paravent parasol lille sagittaire capricorne icluster2 helios azur toulouse bordeaux

Bandwidth spanning tree

End-to-end measurement

◮ Compare real measurements to the one in simulator on reconstructed platform

0.5 1 2

Performance factor Clique TreeBW TreeLat Bandwidth Latency

◮ Clique very good, but trivial result ◮ Latency:

◮ Over-estimated when missing links ( longer path) ◮ Under-estimated when routing on G5K optimizes

bandwidth

◮ Bandwidth mis-estimated:

◮ Technical issue in simulator:

assumes constant TCP window size but it varies with clusters in G5K (simulator validation issue)

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations 22/22

slide-49
SLIDE 49

Goals of the GRAS project Easing infrastructure development

Development of real distributed applications using a simulator

Development

rewrite

Without GRAS

Code Simulation Application Code

Research

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations 22/22

slide-50
SLIDE 50

Goals of the GRAS project Easing infrastructure development

Development of real distributed applications using a simulator

Simulation Application Code

Research & Development

With GRAS

Development

rewrite

Without GRAS

Code Simulation Application Code

Research ◮ Framework for Rapid Development of Distributed Infrastructure

◮ Develop and tune on the simulator; Deploy in situ without modification Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations 22/22

slide-51
SLIDE 51

Goals of the GRAS project Easing infrastructure development

Development of real distributed applications using a simulator

  • SimGrid

GRDK GRE API

Code

Research & Development

With GRAS

Development

rewrite

Without GRAS

Code Simulation Application Code

Research ◮ Framework for Rapid Development of Distributed Infrastructure

◮ Develop and tune on the simulator; Deploy in situ without modification

How: One API, two implementations

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations 22/22

slide-52
SLIDE 52

Goals of the GRAS project Easing infrastructure development

Development of real distributed applications using a simulator

  • SimGrid

GRDK GRE API

Code

Research & Development

With GRAS

Development

rewrite

Without GRAS

Code Simulation Application Code

Research

GRAS

◮ Framework for Rapid Development of Distributed Infrastructure

◮ Develop and tune on the simulator; Deploy in situ without modification

How: One API, two implementations

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations 22/22

slide-53
SLIDE 53

Goals of the GRAS project Easing infrastructure development

Development of real distributed applications using a simulator

  • SimGrid

GRDK GRE API

Code

Research & Development

With GRAS

Development

rewrite

Without GRAS

Code Simulation Application Code

Research

GRAS

◮ Framework for Rapid Development of Distributed Infrastructure

◮ Develop and tune on the simulator; Deploy in situ without modification

How: One API, two implementations

◮ Efficient Grid Runtime Environment (result = application = prototype)

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations 22/22

slide-54
SLIDE 54

Goals of the GRAS project Easing infrastructure development

Development of real distributed applications using a simulator

  • SimGrid

GRDK GRE API

Code

Research & Development

With GRAS

Development

rewrite

Without GRAS

Code Simulation Application Code

Research

GRAS

◮ Framework for Rapid Development of Distributed Infrastructure

◮ Develop and tune on the simulator; Deploy in situ without modification

How: One API, two implementations

◮ Efficient Grid Runtime Environment (result = application = prototype)

◮ Performance concern: efficient communication of structured data

How: Efficient wire protocol (avoid data conversion)

◮ Portability concern: because of grid heterogeneity

How: ANSI C + autoconf + no dependency

Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations 22/22