large scale experiments
play

Large-scale experiments on a cluster Liang Wang - PowerPoint PPT Presentation

Large-scale experiments on a cluster Liang Wang Supervisor: Prof. Jussi Kangasharju Dept. of Computer Science University of Helsinki, Finland 1 Large-scale experiments Motivation


  1. Large-scale experiments � on a cluster � � Liang Wang � Supervisor: Prof. Jussi Kangasharju Dept. of Computer Science � University of Helsinki, Finland � � � 1 �

  2. Large-scale experiments • Motivation � • Modern systems are large and distributed. � • Need to evaluate robustness, adaptability and performance. • Three (four) options � � • Simulator � • Internet • Cluster � • (Analytical) � � 2 �

  3. Why on the cluster • With cluster, we can � • easily control all the participants and access all the data; • � make large-scale experiments reproducible; • simulate different real-life scenarios by using different parameters; � • It looks beautiful, however, � • � cluster is always “smaller” than the experiment scale we want. • design and deploy experiment is non-trivial. � � � 3 �

  4. Ukko cluster • Introduction � • computing infrastructure for the research and education purpose in the Dept. of Computer Science, Univ. of Helsinki. • � everyone in the department can access it. • Specification � • � 240 Dell PoweEdge M610 nodes, connected with 10-Gb link; • � Each node has 32GB of RAM and 2 Intel Xeon E5540 2.53GHz CPUs • Each CPU has 4 cores, there can be 16 concurrent threads due to hyper- threading. � • (Part of our work was done on HIIT cluster) � � 4 �

  5. Our work & aims • Aims in the long-run � • In a nutshell, measure & evaluate large-scale distributed systems in a systematic and consistent manner. • Currently, we ... � • � focus on P2P system (BitTorrent) evaluation in cluster environment. • � develop simple but flexible tools to deploy the experiments and automate the whole process(deploying, collecting data, simple analyzing). � • figure out various restrictions on the large-scale experiments on Ukko cluster � • study how to design reasonable experiments. • � try to gain experience for future evaluation for other systems. � 5 �

  6. BitTorrent experiment • Why it is worth study � • The dominant file-sharing protocol in the world - real-world data can be used to validate the results from the cluster experiments. � • A good starting-point - there is abundant literature can be referred to. • A typical complex system - peer-level behaviors are simple and easy to � understand, the system’s overall behaviors are complicated. � • Experiment target � • Instrumented clients are widely-used in research area. There are several ready-made ones, but not full-fledged. We use our own BitTorrent client, based on official version. � • Evaluate different implementations, mainly focus on Mainline Ver4. � � 6 �

  7. Some practical issues • Bypass I/O � • I/O operations to the hard disk are bypassed. Not only because of the limited storage capacity, it is the first bottleneck of the performance. • With the simplest experiment setting, one seeder, one leecher, and no � limits on the transmission rate, � Node A Node B 1-Gb link � MLBT MLBT � I/O bypassed? stable transmission rate CPU resources on I/O wait No 70MB/s over 85% � Yes 115MB/s almost 0% � � 7 �

  8. Some practical issues (contd.) • Running multiple instances on one node � • Reason: maximize the utilization; enlarge the experiment scale with limited resources. • Method: application-layer isolation, no hypervisor is used. Pros & Cons? � • Lots of nasty issues needs to take care -- e.g. I/O overheads, storage � issue, system parameters. • Bypass the write operations, redirect the read operations. � � send & recv send & recv send & recv send & recv . . . . . . MLBT MLBT MLBT MLBT � WRITE READ READ READ READ � X file � 8 �

  9. Some practical issues (contd.) • Tune the parameters � • the default parameters may work well on a home connection with low bandwidth. But some of them are not suitable on a high performance cluster. � • Sending buffer(reduce write operations to network interface), slice size � (reduce read operations). Control the number of concurrent uploads, which is calculated from the upload rate. � • Other Restrictions � • For example, ip_local_port_range = 32768 ~ 61000 (28232 available) • � CPU, memory, max sockets, max opened file, max processes, etc. � � 9 �

  10. Some practical issues (contd.) � � � � � � � � 10 �

  11. Some practical issues (contd.) � � safe region � � � � safe region � � 11 �

  12. Two-node experiment • Homogeneous experiment, all MLBT with same configurations � • Two types of experiments, upload-constrained & download-constrained • Two types of outgoing connections, connections to the native peers & connections to the foreign peers � � Node A Node B � MLBT MLBT MLBT MLBT � MLBT MLBT MLBT MLBT � MLBT MLBT MLBT MLBT � � 12 �

  13. Change in BT’s behaviors • Two-node experiment: upload-constrained � � � � � � � � 13 �

  14. Change in BT’s behaviors • Two-node experiment: download-constrained � � � � � � � � 14 �

  15. How about three nodes? • Homogeneous experiment, all MLBT with same configurations � Node A Node B MLBT MLBT MLBT MLBT � MLBT MLBT MLBT MLBT MLBT � MLBT MLBT MLBT � Node C � MLBT MLBT MLBT MLBT � MLBT MLBT � � 15 �

  16. Change in BT’s behaviors • How about 3 nodes? (download-constrained) � � � � � � � � 16 �

  17. Conclusion � • To experiment on a cluster, we must consider • Experiment target. (protocols and implementations) � • Platform configurations and limitations. (depends on the underlying os) � • Network configurations and topology. � • Many things can be the bottlenecks, so the experiment should be � carefully designed! � � � 17 �

  18. Conclusion (contd.) • Any other conclusions here? � • It seems experimenting on a cluster is “dangerous”, too many underlying details, too many hackings, too many restrictions can mess up an exp. � • Don’t forget the benefits from the cluster! • It is feasible, but we need to be very careful. � � • Always, or at least try to know every underlying details. � • Always design rational experiment. • Always play in the safe area. � � � 18 �

  19. � Thank you! � � Liang Wang, Dept. of Computer Science � � � � � 19 �

  20. Extra figure of exp on Ukko Mainline Ver4 1 2 nodes cap plan 5000 450 0.9 y=560/x 10000 205 nodes cap plan 15000 400 0.8 y=244/x+20 20000 � 25000 350 0.7 30000 300 Peers/Node 0.6 CDF 250 0.5 200 0.4 150 0.3 � 100 0.2 50 0.1 0 0 0 2 4 6 � 8 10 12 0 200 400 600 800 1000 1200 upload rate (MB/s) Average download rate (KB/S) Aria2 evaluation Aria2 evaluation: 10450 peers, UTPEX enabled 0.9 � 1 cln023 avg. dl cln024 0.8 0.9 Ratio of ul connections to the native peers avg. ul 0.8 0.7 � 0.7 0.6 0.6 0.5 CDF 0.5 0.4 0.4 � 0.3 0.3 0.2 0.2 0.1 0.1 � 0 0 500 600 700 800 900 1000 1100 1200 1300 20 40 60 80 100 120 140 160 180 200 Average ul & dl rate (KB/S) Peers/Node � 20 �

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend