scalable multi purpose network representation for large
play

Scalable Multi-Purpose Network Representation for Large Scale - PowerPoint PPT Presentation

Scalable Multi-Purpose Network Representation for Large Scale Distributed System Simulation Laurent Bobelin 1 , Arnaud Legrand 1 , arquez 2 Pierre Navarro 1 , David A. Gonz alez M Martin Quinson 3 , Fr eric Suter 4 , Christophe Thi ery 3


  1. Scalable Multi-Purpose Network Representation for Large Scale Distributed System Simulation Laurent Bobelin 1 , Arnaud Legrand 1 , arquez 2 Pierre Navarro 1 , David A. Gonz´ alez M´ Martin Quinson 3 , Fr´ eric Suter 4 , Christophe Thi´ ery 3 ed´ 1 LIG, Grenoble University, France 2 Departemento de Computacion, Universitad de Buneos Aires, Argentina 3 LORIA, Nancy University, France 4 IN2P3 Computing Center, CNRS/IN2P3 Lyon-Villeurbanne, France ANR 08 SEGI 022 ANR 11 INFRA 13 CCGrid 2012 A. Legrand (CNRS) INRIA-MESCAL Scalability vs. Accuracy 1 / 12

  2. Large Scale Distributed Systems LSDS (clusters, P2P, grid, volunteer computing, clouds, . . . ) are a pain ◮ analytic methods quickly become intractable and often fail to cap- ture key characteristics of real systems ◮ experiments on the field are tedious, time-consuming, non- reproducible, sometimes even impossible A. Legrand (CNRS) INRIA-MESCAL Scalability vs. Accuracy Context 2 / 12

  3. Large Scale Distributed Systems LSDS (clusters, P2P, grid, volunteer computing, clouds, . . . ) are a pain ◮ analytic methods quickly become intractable and often fail to cap- ture key characteristics of real systems ◮ experiments on the field are tedious, time-consuming, non- reproducible, sometimes even impossible Hence, lots of research in our area rely on simulation A. Legrand (CNRS) INRIA-MESCAL Scalability vs. Accuracy Context 2 / 12

  4. Large Scale Distributed Systems LSDS (clusters, P2P, grid, volunteer computing, clouds, . . . ) are a pain ◮ analytic methods quickly become intractable and often fail to cap- ture key characteristics of real systems ◮ experiments on the field are tedious, time-consuming, non- reproducible, sometimes even impossible Hence, lots of research in our area rely on simulation LSDS simulation challenges ◮ scalability (both in terms of speed and memory) ◮ accuracy /validity/realism (a very context-dependent notion) ◮ genericity A. Legrand (CNRS) INRIA-MESCAL Scalability vs. Accuracy Context 2 / 12

  5. Large Scale Distributed Systems LSDS (clusters, P2P, grid, volunteer computing, clouds, . . . ) are a pain ◮ analytic methods quickly become intractable and often fail to cap- ture key characteristics of real systems ◮ experiments on the field are tedious, time-consuming, non- reproducible, sometimes even impossible Hence, lots of research in our area rely on simulation LSDS simulation challenges ◮ scalability (both in terms of speed and memory) ◮ accuracy /validity/realism (a very context-dependent notion) ◮ genericity Most works trade everything for scalability although. . . Premature optimization is the root of all evil – D.E.Knuth A. Legrand (CNRS) INRIA-MESCAL Scalability vs. Accuracy Context 2 / 12

  6. Validity: Community Requirements Networking Protocol design requires accurate packet-level simulations A. Legrand (CNRS) INRIA-MESCAL Scalability vs. Accuracy Context 3 / 12

  7. Validity: Community Requirements Networking Protocol design requires accurate packet-level simulations Not everyone has such needs A. Legrand (CNRS) INRIA-MESCAL Scalability vs. Accuracy Context 3 / 12

  8. Validity: Community Requirements Networking Protocol design requires accurate packet-level simulations Not everyone has such needs P2P DHT geographic diversity, jitter, churn � no need for contention, only delay A. Legrand (CNRS) INRIA-MESCAL Scalability vs. Accuracy Context 3 / 12

  9. Validity: Community Requirements Networking Protocol design requires accurate packet-level simulations Not everyone has such needs P2P DHT geographic diversity, jitter, churn � no need for contention, only delay P2P streaming network proximity, asymmetry, interference on the edge � ignore the core A. Legrand (CNRS) INRIA-MESCAL Scalability vs. Accuracy Context 3 / 12

  10. Validity: Community Requirements Networking Protocol design requires accurate packet-level simulations Not everyone has such needs P2P DHT geographic diversity, jitter, churn � no need for contention, only delay P2P streaming network proximity, asymmetry, interference on the edge � ignore the core Grid heterogeneity, complex topology, contention w. large transfers � no need to focus on packets A. Legrand (CNRS) INRIA-MESCAL Scalability vs. Accuracy Context 3 / 12

  11. Validity: Community Requirements Networking Protocol design requires accurate packet-level simulations Not everyone has such needs P2P DHT geographic diversity, jitter, churn � no need for contention, only delay P2P streaming network proximity, asymmetry, interference on the edge � ignore the core Grid heterogeneity, complex topology, contention w. large transfers � no need to focus on packets Volunteer Computing dynamic availability, heterogeneity � little need for networking HPC complex communication workload, protocol peculiarities � build on regularity and homogeneity Cloud mixture of previous requirements Consequence: most simulators are ad hoc and domain-specific � �� � read “dead within a year or so” A. Legrand (CNRS) INRIA-MESCAL Scalability vs. Accuracy Context 3 / 12

  12. Network Communication Models Packet-level simulation Networking community has standards, many popular open-source projects (NS, GTneTS, OmNet++,. . . ) ◮ full simulation of the whole protocol stack ◮ complex models � hard to instantiate ◮ inherently slow A. Legrand (CNRS) INRIA-MESCAL Scalability vs. Accuracy Network Models 4 / 12

  13. Network Communication Models Packet-level simulation Networking community has standards, many popular open-source projects (NS, GTneTS, OmNet++,. . . ) ◮ full simulation of the whole protocol stack ◮ complex models � hard to instantiate ◮ inherently slow Delay-based models The simplest ones. . . ◮ communication time = constant delay, statistical distribution, LogP � ( Θ(1) footprint and O (1) computation) ◮ coordinate based systems to account for geographic proximity � ( Θ( N ) footprint and O (1) computation) Although very scalable, these models ignore network congestion and typically assume large bissection bandwidth A. Legrand (CNRS) INRIA-MESCAL Scalability vs. Accuracy Network Models 4 / 12

  14. Network Communication Models (cont’d) Flow-level models A communication (flow) is simulated as a single entity:  S message size   T i,j ( S ) = L i,j + S/B i,j , where L i,j latency between i and j   B i,j bandwidth between i and j Estimating B i,j requires to account for interactions with other flows A. Legrand (CNRS) INRIA-MESCAL Scalability vs. Accuracy Network Models 5 / 12

  15. Network Communication Models (cont’d) Flow-level models A communication (flow) is simulated as a single entity:  S message size   T i,j ( S ) = L i,j + S/B i,j , where L i,j latency between i and j   B i,j bandwidth between i and j Estimating B i,j requires to account for interactions with other flows Assume steady-state and share bandwidth every time a new flow ap- pears or disappears Setting a set of flows F and a set of links L � Constraints For all link j : ̺ i � C j if flow i uses link j A. Legrand (CNRS) INRIA-MESCAL Scalability vs. Accuracy Network Models 5 / 12

  16. Network Communication Models (cont’d) Flow-level models A communication (flow) is simulated as a single entity:  S message size   T i,j ( S ) = L i,j + S/B i,j , where L i,j latency between i and j   B i,j bandwidth between i and j Estimating B i,j requires to account for interactions with other flows Assume steady-state and share bandwidth every time a new flow ap- pears or disappears Setting a set of flows F and a set of links L � Constraints For all link j : ̺ i � C j if flow i uses link j Objective function ◮ Max-Min max(min( ̺ i )) ◮ or other fancy objectives e.g., Reno ∼ max( � log( ̺ i )) A. Legrand (CNRS) INRIA-MESCAL Scalability vs. Accuracy Network Models 5 / 12

  17. Wrap up on flow-level models Such fluid models can account for TCP key characteristics ◮ slow-start ◮ flow-control limitation ◮ RTT-unfairness ◮ cross traffic interference They are a very reasonable approximation for most LSDC systems Yet, many people think they are too complex to scale. Let’s prove them wrong! ¨ ⌣ A. Legrand (CNRS) INRIA-MESCAL Scalability vs. Accuracy Network Models 6 / 12

  18. How to achieve scalability Platform description N nodes and E links Main issues with topology ◮ description size, expressiveness ◮ memory footprint ◮ computation time Representation Input Footprint Parsing Lookup A. Legrand (CNRS) INRIA-MESCAL Scalability vs. Accuracy Topology Representation 7 / 12

  19. How to achieve scalability Platform description N nodes and E links Main issues with topology ◮ description size, expressiveness ◮ memory footprint N ◮ computation time Classical network representation N 1 Flat representation 5000 hosts doesn’t fit in 4Gb! { L 12 , L 52 , . . . , L 4 } Representation Input Footprint Parsing Lookup N 2 N 2 N 2 Flat 1 A. Legrand (CNRS) INRIA-MESCAL Scalability vs. Accuracy Topology Representation 7 / 12

  20. How to achieve scalability Platform description N nodes and E links Main issues with topology ◮ description size, expressiveness ◮ memory footprint ◮ computation time Classical network representation 1 Flat representation 5000 hosts doesn’t fit in 4Gb! 2 Graph representation assum- ing shortest path routing Representation Input Footprint Parsing Lookup Dijsktra N + E E + N log N N + E E + N log N N 2 N 3 Floyd N + E 1 A. Legrand (CNRS) INRIA-MESCAL Scalability vs. Accuracy Topology Representation 7 / 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend