First Steps Towards Automatically Building Network Representations
Lionel Eyraud-Dubois
´ ENS-Lyon, France.
Arnaud Legrand
CNRS, Grenoble, France.
Martin Quinson
Nancy University, France.
Fr´ ed´ eric Vivien
INRIA, Lyon, France.
First Steps Towards Automatically Building Network Representations - - PowerPoint PPT Presentation
First Steps Towards Automatically Building Network Representations Lionel Eyraud-Dubois ENS-Lyon, France. Arnaud Legrand CNRS, Grenoble, France. Martin Quinson Nancy University, France. Fr ed eric Vivien INRIA, Lyon, France.
´ ENS-Lyon, France.
CNRS, Grenoble, France.
Nancy University, France.
INRIA, Lyon, France.
◮ Let GP = (VP, EP) denote the platform graph
1 1 1 1 10 P1 P2 P4 P3
◮ Each edge Pi → Pj is labeled by ci,j:
◮ Communication model:
◮ full-overlap of communications and computations ◮ 1-port for incoming communications and 1-port for outgoing communications
◮ Each node Pi has a processing speed wi ∈ R
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Introduction 2/22
◮ Let GP = (VP, EP) denote the platform graph
1 1 1 1 10 P1 P2 P4 P3
◮ Each edge Pi → Pj is labeled by ci,j:
◮ Communication model:
◮ full-overlap of communications and computations ◮ 1-port for incoming communications and 1-port for outgoing communications
◮ Each node Pi has a processing speed wi ∈ R
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Introduction 2/22
◮ Modern platforms are heterogeneous and dynamic ◮ Distributed applications must be network-aware and reactive ◮ Information on the network needed (at least) for:
◮ Service and distributed application deployment ◮ Communication-aware scheduling ◮ Group communication ◮ Proximity Neighbor Selection in P2P systems
◮ Physical inter-connexion map (wires in the walls) ◮ Routing infrastructure (path of network packets, from router to switch) ◮ Application level (focus on effects – bandwidth & latency – not causes)
◮ Step 1: (End-to-end) measurements ◮ Step 2: Reconstruct a graph
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Introduction 3/22
◮ Example: Remos ◮ Use of SNMP restricted for security reasons (DoS or spying)
◮ Examples: TopoMon, Lumeta, IDmaps, Global Network Positioning ◮ Use of ICMP more and more restricted by admins (for security reasons)
◮ Works without privilege on the network, but must be root on hosts
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations State of the art 4/22
◮ Example: Remos ◮ Use of SNMP restricted for security reasons (DoS or spying)
◮ Examples: TopoMon, Lumeta, IDmaps, Global Network Positioning ◮ Use of ICMP more and more restricted by admins (for security reasons)
◮ Works without privilege on the network, but must be root on hosts
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations State of the art 4/22
◮ Reports bandwidth, latency, CPU availability, and future trends ◮ Only quantitative values, no topological information
◮ Use interference measurements to build a tree representation
◮ Use application-level measurements to optimize collective communications ◮ Should be generalized
◮ Cliques (NWS, ECO) or trees (ENV, Classical latency clustering)
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations State of the art 5/22
◮ Reports bandwidth, latency, CPU availability, and future trends ◮ Only quantitative values, no topological information
◮ Use interference measurements to build a tree representation
◮ Use application-level measurements to optimize collective communications ◮ Should be generalized
◮ Cliques (NWS, ECO) or trees (ENV, Classical latency clustering)
◮ Assess quality of clique and spanning tree algorithms ◮ Propose original approaches
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations State of the art 5/22
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations ALNeM goals and architecture 6/22
◮ Long-term goal: be a tool providing topology to network-aware applications ◮ Short-term goal: allow the study of network mapping algorithms
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations ALNeM goals and architecture 7/22
◮ Long-term goal: be a tool providing topology to network-aware applications ◮ Short-term goal: allow the study of network mapping algorithms DB S S S S S S S S
◮ Lightweight distributed measurement infrastructure (collection of sensors) ◮ MySQL measurement database
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations ALNeM goals and architecture 7/22
◮ Long-term goal: be a tool providing topology to network-aware applications ◮ Short-term goal: allow the study of network mapping algorithms DB S S S S S S S S
◮ Lightweight distributed measurement infrastructure (collection of sensors) ◮ MySQL measurement database
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations ALNeM goals and architecture 7/22
◮ Long-term goal: be a tool providing topology to network-aware applications ◮ Short-term goal: allow the study of network mapping algorithms
Algorithm 1
Right platform Wrong topology Wrong values
Algorithm 2 Algorithm 3
DB
◮ Lightweight distributed measurement infrastructure (collection of sensors) ◮ MySQL measurement database ◮ Topology builder, with several reconstruction algorithms
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations ALNeM goals and architecture 7/22
◮ Long-term goal: be a tool providing topology to network-aware applications ◮ Short-term goal: allow the study of network mapping algorithms
Algorithm 1
Right platform Wrong topology Wrong values
Algorithm 2 Algorithm 3
DB S S S S S S S S
◮ Lightweight distributed measurement infrastructure (collection of sensors) ◮ MySQL measurement database ◮ Topology builder, with several reconstruction algorithms
◮ Implemented using GRAS (same code running in both contexts)
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations ALNeM goals and architecture 7/22
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations ALNeM goals and architecture 8/22
◮ Visual evaluation (structural comparison) ◮ Compare end-to-end measurements (communication-level) ◮ Compare interference amount:
◮ Compare application running times (application-level)
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations ALNeM goals and architecture 8/22
◮ Visual evaluation (structural comparison) ◮ Compare end-to-end measurements (communication-level) ◮ Compare interference amount:
◮ Compare application running times (application-level)
◮ In simulation: collect data on “real” platforms, compare reconstructed to initial ◮ In situ: most comparisons not applicable, so hard to assess quality, but still usable
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations ALNeM goals and architecture 8/22
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 9/22
◮ Connect all pairs of nodes, label with measured values
◮ use edges with lowest latency or highest bandwidth
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 10/22
◮ Add links to the spanning trees, to improve predictions ◮ Connect the closest badly predicted nodes (latency over-estimated)
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 11/22
◮ Add links to the spanning trees, to improve predictions ◮ Connect the closest badly predicted nodes (latency over-estimated)
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 11/22
◮ Add links to the spanning trees, to improve predictions ◮ Connect the closest badly predicted nodes (latency over-estimated) ◮ Update the routing if it improves the predictions
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 11/22
◮ Grow a set of connected nodes ◮ For each node:
◮ Repeatedly add edges to predict better the latency ◮ Until all routes from this node to the set are satisfied
◮ Refrain from adding “redundant” edges
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 12/22
◮ Grow a set of connected nodes ◮ For each node:
◮ Repeatedly add edges to predict better the latency ◮ Until all routes from this node to the set are satisfied
◮ Refrain from adding “redundant” edges
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 12/22
◮ Grow a set of connected nodes ◮ For each node:
◮ Repeatedly add edges to predict better the latency ◮ Until all routes from this node to the set are satisfied
◮ Refrain from adding “redundant” edges
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 12/22
◮ Grow a set of connected nodes ◮ For each node:
◮ Repeatedly add edges to predict better the latency ◮ Until all routes from this node to the set are satisfied
◮ Refrain from adding “redundant” edges
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 12/22
◮ Grow a set of connected nodes ◮ For each node:
◮ Repeatedly add edges to predict better the latency ◮ Until all routes from this node to the set are satisfied
◮ Refrain from adding “redundant” edges
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 12/22
◮ Grow a set of connected nodes ◮ For each node:
◮ Repeatedly add edges to predict better the latency ◮ Until all routes from this node to the set are satisfied
◮ Refrain from adding “redundant” edges
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 12/22
◮ Grow a set of connected nodes ◮ For each node:
◮ Repeatedly add edges to predict better the latency ◮ Until all routes from this node to the set are satisfied
◮ Refrain from adding “redundant” edges
◮ long link (pred > 2 × meas) suspected of redundancy with forthcoming links Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 12/22
◮ Grow a set of connected nodes ◮ For each node:
◮ Repeatedly add edges to predict better the latency ◮ Until all routes from this node to the set are satisfied
◮ Refrain from adding “redundant” edges
◮ long link (pred > 2 × meas) suspected of redundancy with forthcoming links Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 12/22
◮ Grow a set of connected nodes ◮ For each node:
◮ Repeatedly add edges to predict better the latency ◮ Until all routes from this node to the set are satisfied
◮ Refrain from adding “redundant” edges
◮ long link (pred > 2 × meas) suspected of redundancy with forthcoming links ◮ Indeed, green link is redundant. Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 12/22
◮ Grow a set of connected nodes ◮ For each node:
◮ Repeatedly add edges to predict better the latency ◮ Until all routes from this node to the set are satisfied
◮ Refrain from adding “redundant” edges
◮ long link (pred > 2 × meas) suspected of redundancy with forthcoming links ◮ Indeed, green link is redundant. Thus removed. Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Reconstruction algorithms 12/22
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Experimental evaluation 13/22
◮ Real platform built manually (real measurements + admin feedback)
lyon strasbourg belfort grenoble clermont bordeaux montpellier marseille nice rennes besancon
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Experimental evaluation 14/22
◮ Real platform built manually (real measurements + admin feedback)
lyon strasbourg belfort grenoble clermont bordeaux montpellier marseille nice rennes besancon
lyon strasbourg belfort grenoble clermont bordeaux montpellier marseille nice rennes besancon
lyon strasbourg belfort grenoble clermont bordeaux montpellier marseille nice rennes besancon
lyon strasbourg belfort grenoble clermont bordeaux montpellier marseille nice rennes besancon
lyon strasbourg belfort grenoble clermont bordeaux montpellier marseille nice rennes besancon
lyon strasbourg belfort grenoble clermont bordeaux montpellier marseille nice rennes besancon
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Experimental evaluation 14/22
1.0 1.2 1.4
◮ Clique: very good (of course) ◮ Spanning Trees: missing links give bad predictions ◮ Improvement procedure helps ◮ Aggregate performs badly for bandwidth predictions
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Experimental evaluation 15/22
500 1000 1500 2000 2500
◮ Interfinit(ab, cd) ?
◮ Tree*
◮ Misses some links ◮ Most existing interferences right ◮ Lot of false positive
◮ Clique
◮ Predicts no interference at all ◮ No false positive ◮ Very few actual interferences
◮ Improvement reduces false positives
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Experimental evaluation 16/22
1 2
◮ Token and broadcast: same conclusion than end2end
◮ Clique good, Trees bit worse, Improvements work
◮ All2all and pmm: completely new light
◮ Clique dramatically underestimates times:
No contention between parallel communication
◮ Tree* overestimate times: Missing links (as before) ◮ Improved algorithms have good predictive power Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Experimental evaluation 17/22
◮ GridG is a synthetic platform generator [Lu, Dinda – SuperComputing03]
◮ Experiment: 40 platforms (60 hosts – default GridG parameters)
1.2 1.4 1.6 1.8
Accuracy Bandwidth Latency A g g r e g a t e C l i q u e I m p T r e e B W I m p T r e e L a t T r e e B W T r e e L a t
1 2 4
Accuracy token broadcast all2all pmm Aggregate Clique ImpTreeBW ImpTreeLat TreeBW TreeLat
◮ Naive algorithms get bad results ◮ Improved trees yield good reconstructions
◮ ImpTreeBW error ≈ 3% for all2all (worst case) Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Experimental evaluation 18/22
◮ Several criterion of similarity between initial and reconstructed platforms
◮ Allows comparison of reconstruction algorithms from application POV ◮ Runs on simulator or in-situ thanks to GRAS (& SimGrid)
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Conclusion and perspectives 19/22
◮ Several criterion of similarity between initial and reconstructed platforms
◮ Allows comparison of reconstruction algorithms from application POV ◮ Runs on simulator or in-situ thanks to GRAS (& SimGrid)
◮ Evaluated algorithms: Clique, Bandwidth or Latency Spanning Tree ◮ Evaluation conditions: Simulator, both real and synthetic platforms
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Conclusion and perspectives 19/22
◮ Several criterion of similarity between initial and reconstructed platforms
◮ Allows comparison of reconstruction algorithms from application POV ◮ Runs on simulator or in-situ thanks to GRAS (& SimGrid)
◮ Evaluated algorithms: Clique, Bandwidth or Latency Spanning Tree ◮ Evaluation conditions: Simulator, both real and synthetic platforms
◮ Classical algorithms are not satisfactory
◮ Spanning trees: miss edges, leading to performance under-estimation ◮ Clique: do not capture any existing interference
◮ Improving spanning trees yield much better results
◮ Especially ImpTreeBW: uses both kind of measurements Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Conclusion and perspectives 19/22
◮ New set of experiments: only leaf nodes run the measurement processes
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Conclusion and perspectives 20/22
◮ New set of experiments: only leaf nodes run the measurement processes
1 2 4
Accuracy BW Lat A g g r e g a t e C l i q u e I m p T r e e B W I m p T r e e L a t T r e e B W T r e e L a t
1 2 4
Accuracy token broadcast all2all pmm Aggregate Clique ImpTreeBW ImpTreeLat TreeBW TreeLat
◮ None of the proposed heuristic is satisfactory ◮ Future work: improve this!
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Conclusion and perspectives 20/22
◮ Mix bandwidth and latency values ◮ Build graphs with internal (hidden) nodes
◮ Interference (but very expensive to acquire) ◮ Packet gap and back-to-back packets ◮ Packet loss, etc.
◮ Adapt to condition changes (bandwidth variation, node arrival/departure) ◮ Distribute the tool to end-users
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Conclusion and perspectives 21/22
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations Conclusion and perspectives 22/22
◮ Test-bed for Grid researchers ◮ 9 sites in France, targets 5000 cores (2500 currently)
gdx bordeaux toulouse capricorne sagittaire icluster2 helios azur lille parasol paravent
gdx parasol paravent lille icluster2 helios azur sagittaire capricorne toulouse bordeaux
gdx paravent parasol lille sagittaire capricorne icluster2 helios azur toulouse bordeaux
◮ Some links are missing, of course ◮ Bandwidth induced graph better, but maybe G5K more focused on bandwidth
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations 22/22
gdx bordeaux toulouse capricorne sagittaire icluster2 helios azur lille parasol paravent
gdx parasol paravent lille icluster2 helios azur sagittaire capricorne toulouse bordeaux
gdx paravent parasol lille sagittaire capricorne icluster2 helios azur toulouse bordeaux
◮ Compare real measurements to the one in simulator on reconstructed platform
0.5 1 2
Performance factor Clique TreeBW TreeLat Bandwidth Latency
◮ Clique very good, but trivial result ◮ Latency:
◮ Over-estimated when missing links ( longer path) ◮ Under-estimated when routing on G5K optimizes
bandwidth
◮ Bandwidth mis-estimated:
◮ Technical issue in simulator:
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations 22/22
Development
rewrite
Without GRAS
Code Simulation Application Code
Research
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations 22/22
Simulation Application Code
Research & Development
With GRAS
Development
rewrite
Without GRAS
Code Simulation Application Code
Research ◮ Framework for Rapid Development of Distributed Infrastructure
◮ Develop and tune on the simulator; Deploy in situ without modification Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations 22/22
GRDK GRE API
Code
Research & Development
With GRAS
Development
rewrite
Without GRAS
Code Simulation Application Code
Research ◮ Framework for Rapid Development of Distributed Infrastructure
◮ Develop and tune on the simulator; Deploy in situ without modification
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations 22/22
GRDK GRE API
Code
Research & Development
With GRAS
Development
rewrite
Without GRAS
Code Simulation Application Code
Research
GRAS
◮ Framework for Rapid Development of Distributed Infrastructure
◮ Develop and tune on the simulator; Deploy in situ without modification
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations 22/22
GRDK GRE API
Code
Research & Development
With GRAS
Development
rewrite
Without GRAS
Code Simulation Application Code
Research
GRAS
◮ Framework for Rapid Development of Distributed Infrastructure
◮ Develop and tune on the simulator; Deploy in situ without modification
◮ Efficient Grid Runtime Environment (result = application = prototype)
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations 22/22
GRDK GRE API
Code
Research & Development
With GRAS
Development
rewrite
Without GRAS
Code Simulation Application Code
Research
GRAS
◮ Framework for Rapid Development of Distributed Infrastructure
◮ Develop and tune on the simulator; Deploy in situ without modification
◮ Efficient Grid Runtime Environment (result = application = prototype)
◮ Performance concern: efficient communication of structured data
◮ Portability concern: because of grid heterogeneity
Eyraud, Legrand, Quinson, Vivien ALNeM: Automatically Building Network Representations 22/22