enabling large scale testing of iaas cloud platforms on
play

Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid5000 - PowerPoint PPT Presentation

Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid5000 Testbed Sbastien Badia, Alexandra Carpen-Amarie, Adrien Lbre, Lucas Nussbaum Grid5000 S. Badia, A. Carpen-Amarie, A. Lbre, L. Nussbaum Testing IaaS Clouds on


  1. Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid’5000 Testbed Sébastien Badia, Alexandra Carpen-Amarie, Adrien Lèbre, Lucas Nussbaum Grid’5000 S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 1 / 24

  2. Testing IaaS clouds stacks ◮ IaaS Cloud stacks: complex software ◮ Needs to be tested in realistic setups ◮ But testing often limited to: � Single-machine installations � Static deployments This talk: enabling large-scale testing of IaaS Cloud stacks on a shared, reconfigurable testbed S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 2 / 24

  3. Outline Quick overview of the Grid’5000 testbed 1 Support for Virtualization and Cloud on Grid’5000 2 Deploying IaaS Clouds on Grid’5000 3 S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 3 / 24

  4. Grid’5000 Application ◮ Testbed for research on distributed systems � High Performance Computing Programming � Grids environment � Peer-to-peer systems Application runtime � Cloud computing Grid, Cloud or ◮ History: P2P middleware � 2003: Project started (ACI GRID) Operating system � 2005: Opened to users Networking ◮ Funding: Inria, CNRS and many local entities (regions, universities) ◮ Only for research on distributed systems → no production usage Litmus test: are you interested in the result of the computation? � Free nodes during daytime to prepare experiments � Large-scale experiments during nights and week-ends ◮ Also a scientific object: how does one design such a testbed? S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 4 / 24

  5. Leading to results in several fields Cloud: Sky computing on FutureGrid and Grid’5000 ◮ Nimbus cloud deployed on 450+ nodes ◮ Grid’5000 and FutureGrid connected using ViNe HPC: factorization of RSA-768 ◮ Feasibility study: prove that it can be done ◮ Different hardware � understand the performance characteristics of the algorithms Grid: evaluation of the gLite grid middleware ◮ Fully automated deployment and configuration on 1000 nodes (9 sites, 17 clusters) S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 5 / 24

  6. Current status Lille ◮ 11 sites (1 outside France) Luxembourg Reims Orsay ◮ 26 clusters Nancy Rennes ◮ 1700 nodes ◮ 7400 cores ◮ Diverse technologies: Lyon � Intel (60%), AMD (40%) Bordeaux Grenoble � CPUs from one to 12 cores � Myrinet, Infiniband {S,D,Q}DR Toulouse Sophia � Two GPU clusters ◮ 500+ users per year S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 6 / 24

  7. Backbone network Dedicated 10 Gbps backbone provided by RENATER (french NREN) Work in progress: ◮ packet-level and flow-level monitoring ◮ bandwidth reservation and limitation S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 7 / 24

  8. Using Grid’5000: the user’s point of view Site access machine (access.nancy.grid5000.fr) [SSH] Site frontend Site access machine (nancy.grid5000.fr) (access.lyon.grid5000.fr) [OARSUB, KADEPLOY] [SSH] OARSUB OARSH Site clusters/nodes (e.g.: capricorne-12.lyon) Site frontend (frontend.lyon aka lyon) Site clusters/nodes [OARSUB, KADEPLOY] (e.g.: grelon-32.nancy) SSH Grid'5000 dedicated backbone Site clusters/nodes (e.g.: genepi-21.grenoble) Site access machine OARSUB SSH Site clusters/nodes (access.orsay.grid5000.fr) OARSH (e.g.: gdx-102.orsay) [SSH] SSH Site access machine User Site frontend (access.grenoble.grid5000.fr) [SSH] Site frontend (frontend.orsay aka orsay) [SSH] (frontend.grenoble aka grenoble) [OARSUB, KADEPLOY] [OARSUB, KADEPLOY] Site clusters/nodes (e.g.: azur-42.sophia) Site frontend (frontend.sophia aka sophia) [OARSUB, KADEPLOY] ◮ Key tool: SSH Site access machine (access.sophia.grid5000.fr) [SSH] ◮ Private network: connect through access machines ◮ Data storage: NFS (one server per Grid’5000 site) S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 8 / 24

  9. Resource management with OAR ◮ Batch scheduler with specific features � interactive jobs � advance reservations � powerful resource matching ◮ Resources hierarchy: cluster / switch / node / cpu / core ◮ Properties: memory size, disk type & size, hardware capabilities, network interfaces, . . . ◮ Other kind of resources: VLANs, IP ranges for virtualization I want 1 core on 2 nodes of the same cluster with 4096 GB of memory and Infiniband 10G + 1 cpu on 2 nodes of the same switch with dualcore processors for a walltime of 4 hours. . . oarsub -I -l "{memnode=4096 and ib10g=’YES’}/cluster=1/nodes=2/core=1 +{cpucore=2}/switch=1/nodes=2/cpu=1,walltime=4:0:0" S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 9 / 24

  10. Resource management with OAR - visualization Resources status Gantt chart S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 10 / 24

  11. Description, selection, verification of resources ◮ Describing resources � understand results � Detailed description on the Grid’5000 wiki � Machine-parsable format (JSON) S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 11 / 24

  12. Description, selection, verification of resources ◮ Describing resources � understand results � Detailed description on the Grid’5000 wiki � Machine-parsable format (JSON) ◮ Selecting resources � OAR database filled from JSON oarsub -p "wattmeter=’YES’ and gpu=’YES’" S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 11 / 24

  13. Description, selection, verification of resources ◮ Describing resources � understand results � Detailed description on the Grid’5000 wiki � Machine-parsable format (JSON) ◮ Selecting resources � OAR database filled from JSON oarsub -p "wattmeter=’YES’ and gpu=’YES’" ◮ Verifying resources � G5K-checks : validates resources against their description (detect hardware failures and misconfigurations at each boot) S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 11 / 24

  14. Reconfiguring the testbed with Kadeploy ◮ Provides a Hardware-as-a-Service Cloud infrastructure ◮ Enable users to deploy their own software stack & get root access ◮ Standard environments provided to users � Customizations automated using Kameleon ◮ Scalable, efficient, reliable and flexible : � Chain-based and BitTorrent environment broadcast � 255 nodes deployed in 3 minutes ◮ Command-line interface & REST API for scripting http://kadeploy3.gforge.inria.fr/ S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 12 / 24

  15. Customizing the experimental environment ◮ Reconfigure experimental conditions with Distem � Introduce heterogeneity in an homogeneous cluster � Emulate complex network topologies CPU cores 0 1 2 3 4 5 6 7 CPU performance n 1 n 4 i m s 0 f 0 ← f 0 i 1 0 M 2 → 1 b , M p s s s b p m b p , K 0 s 3 0 , 0 0 1 3 m 1 0 s s , m ← p s b K → ← 5 Mbps, 10ms ← 4 Mbps, 12ms 0 n 3 2 if0 if1 s ← m 10 Mbps, 5ms → 6 Mbps, 16ms → 3 2 0 , 0 s 5 p → 1 K b s 2 b M m K p 1 b s 0 p , 1 0 s , s 3 p , 0 ← b 4 m M 0 s m 0 s 0 0 i f i f 1 → 0 n 2 n 5 VN 1 VN 2 VN 3 Virtual node 4 http://distem.gforge.inria.fr/ S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 13 / 24

  16. Virtualisation & Cloud XP requirements ◮ Efficient provisionning of machines � Kadeploy ◮ IP addresses for Virtual Machines ◮ Two different solutions on Grid’5000: � G5K-Subnets � KaVLAN S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 14 / 24

  17. Network reservation with G5K-subnets ◮ Grid’5000 enable different users to run experiments concurrently � Need to mechanism to provide IP ranges for virtual machines ◮ G5K-subnets adds IP ranges reservation to OAR oarsub -l slash_22=2+nodes=8 -I ◮ IP ranges are routable inside Grid’5000 ◮ But no isolation: one can steal IP addresses S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 15 / 24

  18. Network isolation with KaVLAN ◮ Reconfigures switches for the duration of a user experiment to achieve complete level 2 isolation : � Avoid network pollution (broadcast, unsolicited connections) � Enable users to start their own DHCP servers � Experiment on ethernet-based protocols � Interconnect nodes with another testbed without compromising the security of Grid’5000 ◮ Relies on 802.1q (VLANs) ◮ Compatible with many network equipments � Can use SNMP , SSH or telnet to connect to switches � Supports Cisco, HP , 3Com, Extreme Networks and Brocade ◮ Controlled with a command-line client or a REST API S. Badia, A. Carpen-Amarie, A. Lèbre, L. Nussbaum Testing IaaS Clouds on Grid’5000 16 / 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend