Scalable Multi-Purpose Network Representation for Large Scale - - PowerPoint PPT Presentation

scalable multi purpose network representation for large
SMART_READER_LITE
LIVE PREVIEW

Scalable Multi-Purpose Network Representation for Large Scale - - PowerPoint PPT Presentation

Scalable Multi-Purpose Network Representation for Large Scale Distributed System Simulation Laurent Bobelin 1 , Arnaud Legrand 1 , arquez 2 Pierre Navarro 1 , David A. Gonz alez M Martin Quinson 3 , Fr eric Suter 4 , Christophe Thi ery 3


slide-1
SLIDE 1

Scalable Multi-Purpose Network Representation for Large Scale Distributed System Simulation

Laurent Bobelin1, Arnaud Legrand1, David A. Gonz´ alez M´ arquez2 Pierre Navarro1, Martin Quinson3, Fr´ ed´ eric Suter4, Christophe Thi´ ery3

1 LIG, Grenoble University, France 2 Departemento de Computacion, Universitad de Buneos Aires, Argentina 3 LORIA, Nancy University, France 4 IN2P3 Computing Center, CNRS/IN2P3 Lyon-Villeurbanne, France

ANR 08 SEGI 022 ANR 11 INFRA 13

CCGrid 2012

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy 1 / 12

slide-2
SLIDE 2

Large Scale Distributed Systems

LSDS (clusters, P2P, grid, volunteer computing, clouds, . . . ) are a pain

◮ analytic methods quickly become intractable and often fail to cap-

ture key characteristics of real systems

◮ experiments on the field are tedious,

time-consuming, non- reproducible, sometimes even impossible

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Context 2 / 12

slide-3
SLIDE 3

Large Scale Distributed Systems

LSDS (clusters, P2P, grid, volunteer computing, clouds, . . . ) are a pain

◮ analytic methods quickly become intractable and often fail to cap-

ture key characteristics of real systems

◮ experiments on the field are tedious,

time-consuming, non- reproducible, sometimes even impossible Hence, lots of research in our area rely on simulation

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Context 2 / 12

slide-4
SLIDE 4

Large Scale Distributed Systems

LSDS (clusters, P2P, grid, volunteer computing, clouds, . . . ) are a pain

◮ analytic methods quickly become intractable and often fail to cap-

ture key characteristics of real systems

◮ experiments on the field are tedious,

time-consuming, non- reproducible, sometimes even impossible Hence, lots of research in our area rely on simulation LSDS simulation challenges

◮ scalability (both in terms of speed and memory) ◮ accuracy/validity/realism (a very context-dependent notion) ◮ genericity

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Context 2 / 12

slide-5
SLIDE 5

Large Scale Distributed Systems

LSDS (clusters, P2P, grid, volunteer computing, clouds, . . . ) are a pain

◮ analytic methods quickly become intractable and often fail to cap-

ture key characteristics of real systems

◮ experiments on the field are tedious,

time-consuming, non- reproducible, sometimes even impossible Hence, lots of research in our area rely on simulation LSDS simulation challenges

◮ scalability (both in terms of speed and memory) ◮ accuracy/validity/realism (a very context-dependent notion) ◮ genericity

Most works trade everything for scalability although. . . Premature optimization is the root of all evil – D.E.Knuth

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Context 2 / 12

slide-6
SLIDE 6

Validity: Community Requirements

Networking Protocol design requires accurate packet-level simulations

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Context 3 / 12

slide-7
SLIDE 7

Validity: Community Requirements

Networking Protocol design requires accurate packet-level simulations Not everyone has such needs

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Context 3 / 12

slide-8
SLIDE 8

Validity: Community Requirements

Networking Protocol design requires accurate packet-level simulations Not everyone has such needs P2P DHT geographic diversity, jitter, churn no need for contention, only delay

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Context 3 / 12

slide-9
SLIDE 9

Validity: Community Requirements

Networking Protocol design requires accurate packet-level simulations Not everyone has such needs P2P DHT geographic diversity, jitter, churn no need for contention, only delay P2P streaming network proximity, asymmetry, interference on the edge ignore the core

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Context 3 / 12

slide-10
SLIDE 10

Validity: Community Requirements

Networking Protocol design requires accurate packet-level simulations Not everyone has such needs P2P DHT geographic diversity, jitter, churn no need for contention, only delay P2P streaming network proximity, asymmetry, interference on the edge ignore the core Grid heterogeneity, complex topology, contention w. large transfers no need to focus on packets

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Context 3 / 12

slide-11
SLIDE 11

Validity: Community Requirements

Networking Protocol design requires accurate packet-level simulations Not everyone has such needs P2P DHT geographic diversity, jitter, churn no need for contention, only delay P2P streaming network proximity, asymmetry, interference on the edge ignore the core Grid heterogeneity, complex topology, contention w. large transfers no need to focus on packets Volunteer Computing dynamic availability, heterogeneity little need for networking HPC complex communication workload, protocol peculiarities build on regularity and homogeneity Cloud mixture of previous requirements Consequence: most simulators are ad hoc and domain-specific

  • read “dead within a year or so”
  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Context 3 / 12

slide-12
SLIDE 12

Network Communication Models

Packet-level simulation Networking community has standards, many popular open-source projects (NS, GTneTS, OmNet++,. . . )

◮ full simulation of the whole protocol stack ◮ complex models hard to instantiate ◮ inherently slow

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Network Models 4 / 12

slide-13
SLIDE 13

Network Communication Models

Packet-level simulation Networking community has standards, many popular open-source projects (NS, GTneTS, OmNet++,. . . )

◮ full simulation of the whole protocol stack ◮ complex models hard to instantiate ◮ inherently slow

Delay-based models The simplest ones. . .

◮ communication time = constant delay, statistical distribution, LogP

(Θ(1) footprint and O(1) computation)

◮ coordinate based systems to account for geographic proximity

(Θ(N) footprint and O(1) computation) Although very scalable, these models ignore network congestion and typically assume large bissection bandwidth

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Network Models 4 / 12

slide-14
SLIDE 14

Network Communication Models (cont’d)

Flow-level models A communication (flow) is simulated as a single entity: Ti,j(S) = Li,j + S/Bi,j, where      S message size Li,j latency between i and j Bi,j bandwidth between i and j Estimating Bi,j requires to account for interactions with other flows

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Network Models 5 / 12

slide-15
SLIDE 15

Network Communication Models (cont’d)

Flow-level models A communication (flow) is simulated as a single entity: Ti,j(S) = Li,j + S/Bi,j, where      S message size Li,j latency between i and j Bi,j bandwidth between i and j Estimating Bi,j requires to account for interactions with other flows Assume steady-state and share bandwidth every time a new flow ap- pears or disappears Setting a set of flows F and a set of links L Constraints For all link j:

  • if flow i uses link j

̺i Cj

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Network Models 5 / 12

slide-16
SLIDE 16

Network Communication Models (cont’d)

Flow-level models A communication (flow) is simulated as a single entity: Ti,j(S) = Li,j + S/Bi,j, where      S message size Li,j latency between i and j Bi,j bandwidth between i and j Estimating Bi,j requires to account for interactions with other flows Assume steady-state and share bandwidth every time a new flow ap- pears or disappears Setting a set of flows F and a set of links L Constraints For all link j:

  • if flow i uses link j

̺i Cj Objective function

◮ Max-Min max(min(̺i)) ◮ or other fancy objectives

e.g., Reno ∼ max( log(̺i))

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Network Models 5 / 12

slide-17
SLIDE 17

Wrap up on flow-level models

Such fluid models can account for TCP key characteristics

◮ slow-start ◮ flow-control limitation ◮ RTT-unfairness ◮ cross traffic interference

They are a very reasonable approximation for most LSDC systems Yet, many people think they are too complex to scale. Let’s prove them wrong! ¨ ⌣

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Network Models 6 / 12

slide-18
SLIDE 18

How to achieve scalability

Platform description Main issues with topology

◮ description size, expressiveness ◮ memory footprint ◮ computation time

N nodes and E links Representation Input Footprint Parsing Lookup

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Topology Representation 7 / 12

slide-19
SLIDE 19

How to achieve scalability

Platform description Main issues with topology

◮ description size, expressiveness ◮ memory footprint ◮ computation time

N nodes and E links

N N {L12, L52, . . . , L4}

Classical network representation

1 Flat representation

5000 hosts doesn’t fit in 4Gb! Representation Input Footprint Parsing Lookup Flat N2 N2 N2 1

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Topology Representation 7 / 12

slide-20
SLIDE 20

How to achieve scalability

Platform description Main issues with topology

◮ description size, expressiveness ◮ memory footprint ◮ computation time

N nodes and E links Classical network representation

1 Flat representation

5000 hosts doesn’t fit in 4Gb!

2 Graph representation assum-

ing shortest path routing Representation Input Footprint Parsing Lookup Dijsktra N + E E + N log N N + E E + N log N Floyd N + E N 2 N3 1

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Topology Representation 7 / 12

slide-21
SLIDE 21

How to achieve scalability

Platform description Main issues with topology

◮ description size, expressiveness ◮ memory footprint ◮ computation time

N nodes and E links Classical network representation

1 Flat representation

5000 hosts doesn’t fit in 4Gb!

2 Graph representation assum-

ing shortest path routing

3 Special

class

  • f

structures (star, cloud, . . . ) Representation Input Footprint Parsing Lookup Star 1 N N 1 Cloud N N N 1

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Topology Representation 7 / 12

slide-22
SLIDE 22

Our proposal

Every such representation has drawbacks and advantages Let’s build on the fact that most networks are mostly hierarchical

1 Hierarchical organization in AS

cuts down complexity recursive routing

2 Efficient representation of classi-

cal structures

3 Allow bypass at any level

Empty +coords Full Full Dijkstra Floyd Rule− based Rule− based Rule− based based Rule− AS1 AS2 AS4 AS5 AS7 AS6 AS5−3 AS5−1 AS5−2 AS5−4

This approach has been integrated into the open-source SIMGRID simulation toolkit

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Topology Representation 8 / 12

slide-23
SLIDE 23

Evaluation

Size of platform description file

Community Scenario Size P2P 2,500 peers with Vivaldi coordinates 294KB VC 5120 volunteers 435KB + 90MB Grid Grid5000: 10 sites, 40 clusters, 1500 nodes 22KB HPC 1 cluster of 262144 nodes 5KB HPC Hierarchy of 4096 clusters of 64 nodes 27MB Cloud 3 small data centers + Vivaldi 10KB

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Evaluation 9 / 12

slide-24
SLIDE 24

Evaluation

Size of platform description file

Community Scenario Size P2P 2,500 peers with Vivaldi coordinates 294KB VC 5120 volunteers 435KB + 90MB Grid Grid5000: 10 sites, 40 clusters, 1500 nodes 22KB HPC 1 cluster of 262144 nodes 5KB HPC Hierarchy of 4096 clusters of 64 nodes 27MB Cloud 3 small data centers + Vivaldi 10KB

Grid Scenario a master distributes 500, 000 fixed size jobs to 2, 000 workers in a round-robin way GRIDSIM SIMGRID Network model delay-based model flow model Topology none Grid5000 Time 1h 14s Memory 4.4GB 165MB⋆

⋆ 5.2Mb are used to represent the Grid 5000. Stack size not optimized (80KB/worker)

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Evaluation 9 / 12

slide-25
SLIDE 25

P2P DHT

◮ Scenario: Initialize Chord, and simulate 1000 seconds of protocol ◮ Arbitrary Time Limit: 12 hours (kill simulation afterward)

10 000 20 000 30 000 40 000 500 000 1e+06 1.5e+06 2e+06

Running time in seconds Number of nodes

Oversim (OMNeT++ underlay) Oversim (simple underlay) PeerSim SimGrid (flow-based) SimGrid (delay-based)

Largest simulated scenario

Simulator size time OverSim (OMNeT++) 10k 1h40 OverSim (simple) 300k 10h PeerSim 100k 4h36 10k 130s SG (flow-based) 300k 32mn 2M∗ 6h23 SG (delay-based) 2M 5h30

∗ 36GB = 18kB/ process (16kB for the stack)

◮ SIMGRID is orders of magnitude more scalable than state-of-the-art

P2P simulators

◮ Using the precise flow-based model incurs a limited (≈ 20%) slow-

down, while simulation accuracy is improved

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Evaluation 10 / 12

slide-26
SLIDE 26

HPC workload

0.01 0.1 1 10 100 1000 10000 10 12 14 16 18 20 22 24 Simulation Time (s) Log2 of the Number of Processes SimGrid LogGoPSim

Simulating a binomial broadcast:

◮ SIMGRID

is roughly 75% slower than LOGGOPSIM

◮ SIMGRID is at least 20% more

fat than LOGGOPSIM (15GB required for 223 processors) The genericity of SIMGRID data structures comes at the cost of a slight overhead This demonstrates that scalability does not necessarily comes at the price of realism (e.g., ignoring contention on the interconnect)

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Evaluation 11 / 12

slide-27
SLIDE 27

Conclusion

Take away message

◮ The widespread belief that “scalable simulations require to over-

simplify the network models and avoid the use of threads” is erroneous

◮ SIMGRID is open-source, mature, and does not trade accuracy

and meaning for scalability use it instead of rewriting ad hoc simulators http://simgrid.gforge.inria.fr Future plan

1 Further reduce platform description size (hence parsing time)

and memory footprint by exploiting stochastic regularity and improving programmable description approach

2 Consider the specifics of emerging computing systems such as

clouds or exascale platforms: http://infra-songs.gforge.inria.fr/

  • A. Legrand (CNRS) INRIA-MESCAL

Scalability vs. Accuracy Conclusion 12 / 12