Large-scale experiments on a cluster Liang Wang - - PowerPoint PPT Presentation

large scale experiments
SMART_READER_LITE
LIVE PREVIEW

Large-scale experiments on a cluster Liang Wang - - PowerPoint PPT Presentation

Large-scale experiments on a cluster Liang Wang Supervisor: Prof. Jussi Kangasharju Dept. of Computer Science University of Helsinki, Finland 1 Large-scale experiments Motivation


slide-1
SLIDE 1

Large-scale experiments

  • n a cluster

Liang Wang Supervisor: Prof. Jussi Kangasharju

  • Dept. of Computer Science

University of Helsinki, Finland

  • 1
slide-2
SLIDE 2
  • Motivation
  • Modern systems are large and distributed.
  • Need to evaluate robustness, adaptability and performance.
  • Three (four) options
  • Simulator
  • Internet
  • Cluster
  • (Analytical)

Large-scale experiments

  • 2
slide-3
SLIDE 3
  • With cluster, we can
  • easily control all the participants and access all the data;
  • make large-scale experiments reproducible;
  • simulate different real-life scenarios by using different parameters;
  • It looks beautiful, however,
  • cluster is always “smaller” than the experiment scale we want.
  • design and deploy experiment is non-trivial.

Why on the cluster

  • 3
slide-4
SLIDE 4
  • Introduction
  • computing infrastructure for the research and education purpose in the
  • Dept. of Computer Science, Univ. of Helsinki.
  • everyone in the department can access it.
  • Specification
  • 240 Dell PoweEdge M610 nodes, connected with 10-Gb link;
  • Each node has 32GB of RAM and 2 Intel Xeon E5540 2.53GHz CPUs
  • Each CPU has 4 cores, there can be 16 concurrent threads due to hyper-

threading.

  • (Part of our work was done on HIIT cluster)

Ukko cluster

  • 4
slide-5
SLIDE 5
  • Aims in the long-run
  • In a nutshell, measure & evaluate large-scale distributed systems in a

systematic and consistent manner.

  • Currently, we ...
  • focus on P2P system (BitTorrent) evaluation in cluster environment.
  • develop simple but flexible tools to deploy the experiments and

automate the whole process(deploying, collecting data, simple analyzing).

  • figure out various restrictions on the large-scale experiments on Ukko

cluster

  • study how to design reasonable experiments.
  • try to gain experience for future evaluation for other systems.

Our work & aims

  • 5
slide-6
SLIDE 6
  • Why it is worth study
  • The dominant file-sharing protocol in the world - real-world data can be

used to validate the results from the cluster experiments.

  • A good starting-point - there is abundant literature can be referred to.
  • A typical complex system - peer-level behaviors are simple and easy to

understand, the system’s overall behaviors are complicated.

  • Experiment target
  • Instrumented clients are widely-used in research area. There are several

ready-made ones, but not full-fledged. We use our own BitTorrent client, based on official version.

  • Evaluate different implementations, mainly focus on Mainline

Ver4.

BitTorrent experiment

  • 6
slide-7
SLIDE 7
  • Bypass I/O
  • I/O operations to the hard disk are bypassed. Not only because of the

limited storage capacity, it is the first bottleneck of the performance.

  • With the simplest experiment setting, one seeder, one leecher, and no

limits on the transmission rate,

Some practical issues

  • 7

I/O bypassed? stable transmission rate CPU resources on I/O wait No 70MB/s

  • ver 85%

Yes 115MB/s almost 0%

Node A

MLBT

Node B

MLBT

1-Gb link

slide-8
SLIDE 8
  • Running multiple instances on one node
  • Reason: maximize the utilization; enlarge the experiment scale with

limited resources.

  • Method: application-layer isolation, no hypervisor is used. Pros & Cons?
  • Lots of nasty issues needs to take care -- e.g. I/O overheads, storage

issue, system parameters.

  • Bypass the write operations, redirect the read operations.

Some practical issues (contd.)

  • 8

MLBT MLBT MLBT MLBT

. . . . . .

file READ READ READ READ send & recv send & recv send & recv send & recv WRITE X

slide-9
SLIDE 9
  • Tune the parameters
  • the default parameters may work well on a home connection with low
  • bandwidth. But some of them are not suitable on a high performance

cluster.

  • Sending buffer(reduce write operations to network interface), slice size

(reduce read operations). Control the number of concurrent uploads, which is calculated from the upload rate.

  • Other Restrictions
  • For example, ip_local_port_range = 32768 ~ 61000 (28232 available)
  • CPU, memory, max sockets, max opened file, max processes, etc.

Some practical issues (contd.)

  • 9
slide-10
SLIDE 10

Some practical issues (contd.)

  • 10
slide-11
SLIDE 11

Some practical issues (contd.)

  • 11

safe region safe region

slide-12
SLIDE 12
  • Homogeneous experiment, all MLBT with same configurations
  • Two types of experiments, upload-constrained & download-constrained
  • Two types of outgoing connections, connections to the native peers &

connections to the foreign peers

Two-node experiment

  • 12

Node B

MLBT MLBT MLBT MLBT MLBT MLBT

Node A

MLBT MLBT MLBT MLBT MLBT MLBT

slide-13
SLIDE 13
  • Two-node experiment: upload-constrained

Change in BT’s behaviors

  • 13
slide-14
SLIDE 14
  • Two-node experiment: download-constrained

Change in BT’s behaviors

  • 14
slide-15
SLIDE 15
  • Homogeneous experiment, all MLBT with same configurations

How about three nodes?

  • 15

Node B

MLBT MLBT MLBT MLBT MLBT MLBT

Node A

MLBT MLBT MLBT MLBT MLBT MLBT

Node C

MLBT MLBT MLBT MLBT MLBT MLBT

slide-16
SLIDE 16
  • How about 3 nodes? (download-constrained)

Change in BT’s behaviors

  • 16
slide-17
SLIDE 17
  • To experiment on a cluster, we must consider
  • Experiment target. (protocols and implementations)
  • Platform configurations and limitations. (depends on the underlying os)
  • Network configurations and topology.
  • Many things can be the bottlenecks, so the experiment should be

carefully designed!

Conclusion

  • 17
slide-18
SLIDE 18
  • Any other conclusions here?
  • It seems experimenting on a cluster is “dangerous”, too many underlying

details, too many hackings, too many restrictions can mess up an exp.

  • Don’t forget the benefits from the cluster!
  • It is feasible, but we need to be very careful.
  • Always, or at least try to know every underlying details.
  • Always design rational experiment.
  • Always play in the safe area.

Conclusion (contd.)

  • 18
slide-19
SLIDE 19

Thank you!

Liang Wang, Dept. of Computer Science

  • 19
slide-20
SLIDE 20

Extra figure of exp on Ukko

  • 20

2 4 6 8 10 12 50 100 150 200 250 300 350 400 450 upload rate (MB/s) Peers/Node 2 nodes cap plan y=560/x 205 nodes cap plan y=244/x+20 200 400 600 800 1000 1200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Mainline Ver4 Average download rate (KB/S) CDF 5000 10000 15000 20000 25000 30000 20 40 60 80 100 120 140 160 180 200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Aria2 evaluation Peers/Node Ratio of ul connections to the native peers cln023 cln024 500 600 700 800 900 1000 1100 1200 1300 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Aria2 evaluation: 10450 peers, UTPEX enabled Average ul & dl rate (KB/S) CDF

  • avg. dl
  • avg. ul