A.A. 2010-2011
A.A. 2010-2011 Tecnologie per la Virtualizzazione delle Reti di - - PowerPoint PPT Presentation
A.A. 2010-2011 Tecnologie per la Virtualizzazione delle Reti di - - PowerPoint PPT Presentation
A.A. 2010-2011 Tecnologie per la Virtualizzazione delle Reti di Calcolatori Massimo RIMONDINI RCNG 02/11/10 Virtualization HA! HA! Easy! Virtualization DOh! Virtualization ... the act of decoupling the (logical) service from its
Massimo RIMONDINI
RCNG – 02/11/10
Tecnologie per la Virtualizzazione delle Reti di Calcolatori
Virtualization
HA! HA! Easy!
Virtualization
D’Oh!
Virtualization
“...the act of decoupling the (logical) service from its (physical) realization...” “execution of software in an environment separated from the underlying hardware resources” “sufficiently complete simulation of the underlying hardware to allow software, typically a guest operating system, to run unmodified” “complete simulation of the underlying hardware”
Virtualization
Full virtualization (emulation) Partial virtualization Paravirtualization (OS-assisted virtualization) Hardware-assisted virtualization OS-Level virtualization
Full Virtualization
Emulation of a fully fledged hardware box (e.g., x86) Binary translation
For non-virtualizable instructions (have different semantic in Rings ≠0)
Direct execution
For performance
VirtualBox, Parallels, ~VMware, Microsoft Virtual PC, QEMU, Bochs
Picture from Understanding Full Virtualization, Paravirtualization, and Hardware Assist. White Paper. VMware.
(Emulation)
Partial Virtualization
E.g., address space virtualization Supports multiple instances of a specific hardware device Does not support running a guest OS FreeBSD network stack virtualization project, IBM M44/44X (More) of historical interest
Address space “virtualization” is a basic component in modern OSs
Paravirtualization
(OS-assisted virtualization)
VMM{+,/}Hypervisor Guest OS communicates with hypervisor Changes to guest OS (to prevent non- virtualizable instructions from contacting bare metal) Better performance Support for hardware- assisted virtualization Xen, VMware, Microsoft Hyper-V, Oracle VM Server for SPARC, VirtualBox
Picture from Understanding Full Virtualization, Paravirtualization, and Hardware Assist. White Paper. VMware.
Hypervisor (and VMM)
Hypervisor
T ype 1 (native): runs on bare metal; loads prior to the OS
Microsoft Hyper-V, VMware vSphere
T ype 2 (hosted): runs within a conventional OS
Virtual Machine Monitor
Same as hypervisor (?)
Picture from Understanding Full Virtualization, Paravirtualization, and Hardware Assist. White Paper. VMware.
Transparent Paravirtualization
Photo credit goes to Flickr user Alexy.
Huh?
Transparent Paravirtualization
Virtual Machine Interface (VMI) Single VMI-compliant guest kernel VMI calls may have two implementations
inline native instructions (run on bare metal) indirect calls to hypervisor
paravirt-ops
IBM+VMware+Red Hat+XenSource Part of Linux kernel since 2.6.20
Picture from Understanding Full Virtualization, Paravirtualization, and Hardware Assist. White Paper. VMware.
Hardware-assisted Virtualization
Hypervisor runs below Ring 0 Sensitive calls are automatically trapped to the hypervisor Effective guest isolation AMD-V (Pacifica) Intel VT-x (Vanderpool) VirtualBox, KVM, Microsoft Virtual PC, Xen, Parallels, ...
Picture from Understanding Full Virtualization, Paravirtualization, and Hardware Assist. White Paper. VMware.
OS-Level Virtualization
Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead Requires host kernel patch Share same system call interface
Limits the set of runnable guests
Processes in a virtual server are regular processes on the host Resources (e.g., memory) can be requested at runtime Linux VServer, Parallels Virtuozzo Containers, OpenVZ, Solaris Containers, FreeBSD Jails; to a certain extent, UMview, UML
Able to run guest OS Unmodified Guest Unmodified Host Overhead Flexibility Full virtualization (emulation)
✓ ✓
Depends High Limited Partial virtualization
✗ ✓ ✓
Low Limited Paravirtualization (OS-assisted virtualization)
✓ ✗ ✗
Low High Hardware-assisted virtualization
✓ ✓ ✓
Mostly
- ffloaded
to hardware Average OS-Level Virtualization Almost
✗ ✗
Low High
Which virtualization for networking?
Requirements
Depend much on the context
Operational network Experimentation
Anyway...
Performance and scalability Flexibility
Configuration Programmability (for development)
Support for mobility Strong isolation
Ahem.... usability
T
- ols for Managing Virtual
Network Scenarios
Netkit
Roma T re University VM engine: UML VM interconnection: uml_switch Core: shell scripts Routing engine: Quagga, XORP Lab description: (mostly) uses native router language Lightweight Easy-to-share labs Several networking technologies, including MPLS forwarding
Netkit
VNUML
Universidad Politécnica de Madrid, T elefónica I+D VM engine: UML VM interconnection: uml_switch Core: python + perl scripts Routing engine: Quagga Lab description: XML Build, then play Support for distributed emulation (segmentation)
Round robin Weighted (by CPU load before deploy) round robin
VNUML
H: ssh, scp, rsync W: SNMP, telnet TFTP
VNUML
Backplane: one or more 802.1Q-compliant switches Host-switch and switch-switch connections are trunks
Marionnet
Université Paris VM engine: UML VM interconnection: uml_switch, vde Core: OCaml Routing engine: Quagga Lab description: GUI (dot-based layout) Ability to export project file for reuse Network impairments (delay, loss → unidirectional links, bandwidth, flipped bits) Switch status leds
Marionnet
Introducing indirection to support...
...stable endpoints ...port “defects”
GINI
McGill University VM engine: UML VM interconnection: customized uml_switch, implementation
- f a wireless channel
Core: C + python Routing engine: custom implementation (compliant with the Linux TCP/IP stack) Lab description: GUI, XML Integrated task manager to start/stop nodes Real-time performance plots
Vincent Perrier VM engine: UML, (QEMU+)KVM, Openwrt VM interconnection: improved uml_switch Core: C Routing engine: N/A Lab description: custom markup Several customizations and hacks
Ability to plot the value of any kernel variables Switch supports tcp sockets, real-time configuration with XML messages Built-on-the-fly CDROM image for machine differentiation
IMUNES
University of Zagreb VM engine: N/A (stack virtualization) VM interconnection: N/A Core: N/A Routing engine: N/A GUI Based on FreeBSD VirtNET
IMUNES
VirtNET: network state replication
IMUNES
VirtNET: network state replication In a vimage it is possible to
configure network interfaces
- pen sockets
run processes
(in some way) similar to UMview
Virtual Routers
DynaMIPS
Christophe Fillot (University of T echnology of Compiegne) Supported platforms (as of v0.2.7)
Cisco 7200 (NPE-100 to NPE-400) Cisco 3600 (3620, 3640 and 3660) Cisco 2691 Cisco 3725 Cisco 3745
No acceleration
CPU idle times must be tuned “Of course, this emulator cannot replace a real router: you should be able to get a performance of about 1 kpps [...], to be compared to the 100 kpps delivered by a NPE-100 [...]. So, it is simply a complementary tool to real labs for administrators of Cisco networks or people wanting to pass their CCNA/CCNP/CCIE exams.”
DynaMIPS
Fully virtualized hardware
ROM, RAM, NVRAM Chassis Console, AUX PCMCIA ATA disks ATM/Frame Relay/Ethernet virtual switch between emulator instances Port adapters, network modules
Interface binding
UNIX socket VDE tap host interface (optionally via libpcap) UDP port
Some lacking opcodes (mostly FPU) Can manage multiple instances (“hypervisor” mode) Development stalled, but still a milestone
DynaMIPS
Dynagen: a Python frontend Dynagui
DynaMIPS
Dynagen: a Python frontend Dynagui GNS3
So, Performance and Scalability are 2 Bugaboos...
Carrier grade equipment: 40Gbps ÷ 92Tbps Per-packet processing capability must scale with O(line_rate) Aggregate switching capability must scale with O(port_count * line_rate) But software routers need not run on a single server: RouteBricks
Columbia+Intel+UCLA+Berkeley+... (a 10-authors paper!) Click-based
Software routers: 1Gbps ÷ 3Gbps
RouteBricks
Cluster router architecture Parallelism across servers
Nodes can make independent decisions on a subset of the overall traffic
Parallelism within servers
CPU I/O, memory
RouteBricks
Intra-cluster routing
Valiant Load Balancing (VLB): source random node destination Randomizes input traffic No centralized scheduling Beware of reordering
T
- pology
Full mesh: not feasible (server fanout is limited!) Commodity Ethernet switches: not viable!
Missing load-sensitive routing features Cost
T
- ri and butterflies
RouteBricks
Butterfly topology
2-ary 4-fly
RouteBricks
Experimental setup
4 Intel Xeon servers (Nehalem microarchitecture)
1 10Gbps external line each
Full-mesh topology
Bottleneck
64-byte packets: <19Mpps sustained Caused by CPU
Per-byte CPU load higher for smaller packets
Programmability
NIC driver 2 Click elements
Click
UCLA A modular software router UNIX-pipe-like composition Implemented as a Linux kernel extension
333,000 64-byte packets per second on a 700 MHz Pentium III
Element
Connection
Click
Element: a packet processing module performing a simple computation (e.g., decrease TTL, routing table lookup, packet queueing, etc.)
A C++ object wih a state Has multiple input/output ports May have configuration settings May export an interface (e.g., method to report queue length)
Click
Connection: a packet handoff path
Push processing
For unsolicited packets (e.g., from a device)
Pull processing
For packet schedulers
Only 1 connection per port Element ports can be push (black), pull (white), or agnostic (outline)
Push and pull are not mixable!
Example: simple router queue
Click
Note: queues do not live in ports: they are
- bjects
Scheduling
Single thread (multithreaded implementation available) Scheduling unit: element Scheduling order: according to the packet’s path along the graph Elements may be implicitly scheduled when their push/pull methods are called Elements may have timers
Click
Flow-based contexts
Answer the questions: “If I were to emit a packet on my second output...
...where might it go?” ...which Queues might it encounter?” ...and it stopped at the first Queue it encountered, where might it stop?”
Click
Declarative language
src :: FromDevice(eth0); ctr :: Counter; sink :: Discard; src -> ctr; ctr -> sink;
- r
FromDevice(eth0) -> Counter -> Discard;
Click
The Element class has ~20 virtual functions Most default implementations are fine Just override push, pull, and run_scheduled
Sample transparent element:
class NullElement: public Element { public: NullElement() { add_input(); add_output(); } const char *class_name() const { return "Null"; } NullElement *clone() const { return new NullElement; } const char *processing() const { return AGNOSTIC; } void push(int port, Packet *p) { output(0).push(p); } Packet *pull(int port) { return input(0).pull(); } };
Click
Fine-grained elements are preferred
Not easy with BGP
Shared structures (e.g., routing tables)
Incorporated into the packet forwarding path
Example: IP Router
See page 12 of [Kohler]...
Virtual Switches
“Ordinary” Virtual Switch
A piece of software working at layer 2/3, inside the hypervisor or the hardware management layer
Can we do any better?
If we are in a virtual scenario, the network layer can...
...know about existing hosts (MAC/IP addresses) and their “movements” ...know interface operational mode (e.g., promiscuous) ...know about multicast memberships ...handle a flat topology (all the nodes are leaves, therefore we do not need STP) ...know the OS run by virtual machines (and, for example, run an OS-aware Deep Packet Inspection) ...support migration outside of a single subnet
VMware’s Approach
vNetwork Distributed switch
“VMware’s next generation virtual networking solution for spanning multiple hosts with a single virtual switch representation” Private VLANs (restrict communication between virtual machines on the same VLAN) Network VMotion—tracking of VM networking state 3rd Party Virtual Switch support (Cisco Nexus 1000V Series Virtual Switch) Bi-directional traffic shaping
VMware’s Approach
VEPA
Virtual Ethernet Port Aggregator An IEEE standard proposal Idea: off-load switching activities from hypervisor-based virtual switches to physical switches VEPA+OVF: the future?
OVF (Open Virtualization Format): metadata describing a virtual machine
June 2009: Linux kernel patch for VEPA support T agged (VEPA-aware switch) and tagless (VEPA-agnostic switch that bounces packets back) variants
VEPA
From Hudson, Congdon. Tag-less Virtual Ethernet Port Aggregator (VEPA) Proposal. 2009.
OpenvSwitch
Citrix Systems, Nicira, Intel, NEC, Google... Virtualization layer no longer (just) Ethernet- based
OpenvSwitch
QoS, tunneling, filtering
Tunneling supports transparent inter-subnet migration without breaking transport sessions
Interface status migration (e.g., to bind policies to interfaces tightly) Support for VLANs and GRE tunnels (intelligent forwarding)
All VMs on a single host: VPNs within a single OpenvSwitch instance All VMs on the same LAN: implement VPNs by VLANs VMs on different subnets: implement VPNs by GRE tunnels
Operates in the hypervisor (dom0 in Xen) Offers connectivity between VMs and physical interfaces on the host Forwarding state and runtime configuration can be altered by some programming interfaces VEPA compatible (the control layer is able to manage a VEPA-enabled switch)
OpenvSwitch
Configuration options
Port mirroring (SPAN, RSPAN – a variant of SPAN that allows remote monitoring) QoS policies NetFlow Bonding
Unified CLI for managing distributed switches Exports interfaces compatible with VDE and Linux bridges Forwarding
Ability to manipulate the forwarding table (to support status migration, where “status”=flow counters, ACLs, tunnels, etc.) Packet processing based on layer 2/3/4 headers
Actions: forward (from ≥1 ports), drop, en/decapsulate
OpenvSwitch
Forwarding paths
Fast path
kernel space forwarding engine few code (portability, performance, hardware implementation) comparable performance with Linux Ethernet bridge (pure kernel space, MAC-based forwarding only)
Slow path
user space forwarding logic (MAC learning, load balancing), remote management (NetFlow, OpenFlow)
VDE
University of Bologna Virtual Distributed Ethernet: the swiss army knife of emulated networks A general VPN A mobility support technology A tool for network testing A reconfigurable overlay A privacy-preserving layer ...
VDE
vde_switch vde_switch vde_cable
vde_switch
MAC address learning (with aging) Ability to operate as a hub VLANs Fast STP Can be connected to a tap interface
vde_cable
Interconnects vde_switches Does not exist...
vde_plug
VDE
vde_cable = vde_plug + dpipe vde_plug
socket → stdout stdin → socket
dpipe
stdin → stdout stdout → stdin vde_switch vde_plug
UNIX socket
dpipe
VDE
vde_switch
UNIX socket
dpipe dpipe vde_plug /tmp/vde1.ctl = vde_plug /tmp/vde2.ctl vde_switch
UNIX socket
vde_switch
UNIX socket
dpipe+ssh dpipe vde_plug /tmp/vde.ctl = ssh foo@remote.host.org vde_plug /tmp/vde_remote.ctl vde_switch
UNIX socket
VDE
vde_switch
UNIX socket
vde_cryptcab
host1$ vde_cryptcab -s /tmp/vde.ctl -p 12000 host2$ vde_cryptcab -s /tmp/vde_local.ctl
- c username@host1:12000
vde_switch
UNIX socket
dpipe+ssh is not a very good solution...
Dropping ssh is not advisable With ssh, we may experience interference between congestion control algorithms
vde_cryptcab (UDP-based)
VDE
With ssh, we may experience interference between congestion control algorithms.........
VDE
vde_plug2tap --daemon -s /tmp/myvde.ctl tap0
vde_plug2tap
Direct connection from vde_switch to host tap interface Same thing can be done during creation of the vde_switch
UNIX socket
tap0
VDE
slirpvde
Internet connection No root privileges Connections are regenerated (a sort of masquerading) Provides DHCP
UNIX socket
slirpvde -d -s /tmp/vde.ctl -dhcp
VDE
UNIX socket
dpipe
dpipe vde_plug /tmp/vde1.ctl = wirefilter -M /tmp/wiremgmt = vde_plug /tmp/vde2.ctl
UNIX socket
wirefilter dpipe
wirefilter: emulates real cable features (runtime tunable)
max packet queue capacity mtu bit corruption reordering % lost packets delay duplication bandwitdh interface speed
VDE
Software that supports VDE as a userspace switch
VirtualBox QEMU/KVM (with wrappers) UML OpenvSwitch DynaMIPS ...
T estbeds
A collection of machines distributed over the globe
Hosted by research institutions Only accessible by organizations in the Consortium
No fee for academic institutions
Each machine runs a software package (PLC – PlanetLab Central)
Includes a Linux-based OS Node bootstrapping Management, monitoring, and auditing tools Supports distributed virtualization (slicing)
Uses VNET(+) for traffic isolation between slices
Stand-alone version for private use: MyPLC
What would the network administrator at your organization say about the experiment running on your local site? Continuously running experiments: Active network probing (within netiquette):
Service disruption, actions triggering adm complaints are a no-go!
University of Utah A facility
Windows/Linux nodes
>1300 network ports that can be connected arbitrarily by remotely setting up VLANs on the switches
Virtual nodes (Xen-based)
Arbitrary topology
A software system
“a kind of "operating system" for controlling collections of networked devices of all types, for the purpose of controlled experimentation”
Acceptable use: “in principle, almost any research
- r experimental use of the testbed by
experimenters that have a need for it is appropriate” Users map
>500 nodes
High-end (2.4 GHz Quad Core Xeon “Nehalem”, 12GB RAM 1066MHz, 2x250GB SATA disks, 8 GbE interfaces) Mid-end (3.0 GHz 64-bit Xeon, 2GB RAM 400Mhz, 2x146GB 10kRPM SCSI disks, 6 GbE interfaces) Low-end (600MHz Intel Pentium III, 256 MB RAM, 13GB IDE hdd, 5 FE interfaces, WiFi)
12 switches (Cisco and HP) Servers
DB, DNS, users, file, serial line, etc.
Remote power controllers
VINI
“a virtual network infrastructure that allows network researchers to evaluate their protocols and services in the wide area” Runs on top of PlanetLab 42 nodes @ 27 sites
VINI
Approach & technologies
VINI
A VINI router
XORP: routing engine Click: forwarding engine
VINI
Encapsulation
NSF project “a virtual laboratory for exploring future internets at scale” Keywords
Programmability Virtualization and resource sharing Federation (among participating organizations) Slice-based experimentation
The PlanetLab team is also involved
So, which virtualization for networking?
(node) Virtualization type Node virtualization technology Link virtualization technology Netkit “Paravirtualization” UML uml_switch VNUML “Paravirtualization” UML uml_switch Marionnet “Paravirtualization” UML uml_switch + vde GINI “Paravirtualization” UML uml_switch + customizations Cloonix “Paravirtualization” UML, QEMU, OpenWRT uml_switch + customizations IMUNES OS-level None VirtNET DynaMIPS Emulation Custom Custom RouteBricks None None None Click “Partial virtualization” API N/A VMware vNetwork “OS-level” N/A Custom VEPA “OS-level” N/A Custom OpenvSwitch “OS-level” N/A Custom VDE OS-level N/A Custom PlanetLab None None Overlay Emulab Paravirtualization Xen N/A / Overlay / None VINI “Paravirtualization” UML Overlay GENI N/A N/A N/A
A one-fits-all proposal: The Network Hypervisor
Tunnels, VLANs, VRFs...: a pool of forwarding capacity
A one-fits-all proposal: The Network Hypervisor
T unnels, VLANs, VRFs...: a pool of forwarding capacity Rationale: virtualization should happen at the forwarding layer (amidst tunnels and VMs) Network hypervisor
a mapper between the logical and physical network
gets a view of the logical network from the control plane gets a view of the physical topology from a centralized management system
The Network Hypervisor
Control plane Logical forwarding plane Network Hypervisor Physical forwarding plane
logical fw tables, ports hw switches
The Network Hypervisor
Forwarding in a “hypervised” switch
- 1. Mapping packet to logical context
- 2. Logical forwarding
- 3. Mapping decision to physical context
- 4. Physical forwarding
Handled by the IGP Handled by the hypervisor
The Network Hypervisor
Prototype implementation
Physical switch
L2 over GRE L2 packets are relevant in the logical context GRE tunnels are the physical transport L2-to-tunnel mapping (=logical forwarding decision) handled by the hypervisor VLAN/MPLS tags to indicate logical context
Network hypervisor
Distributed OpenFlow controller Load balancing but no enforceable bandwidth on ports and links Logical forwarding only at the edge of the network
An open standard by which high level routing decisions can be run on a separate server. Supports experimentation without requiring vendors to expose internals.
References
Understanding Full Virtualization, Paravirtualization, and Hardware Assist. White paper.
- VMware. 2007.
What’s New in VMware vSphere™ 4: Virtual Networking. White paper. VMware. 2009. VMware Virtual Networking Concepts. White paper. VMware. 2007. Argyraki, Baset, Chun, Fall, Iannaccone, Knies, Kohler, Manesh, Nedveschi, Ratnasamy. Can Software Routers Scale?. PRESTO 2008. Dobrescu, Egi, Argyraki, Chun, Fall, Iannaccone, Knies, Manesh, Ratnasamy. RouteBricks: Exploiting Parallelism to Scale Software Routers. SOSP 2009. Casado, Koponen, Ramanathan, Shenker. Virtualizing the Network Forwarding Plane. PRESTO 2010. Galán, Fernández, Ferrer, Martín. Scenario-Based Distributed Virtualization Management Architecture for Multi-host Environments. SVM 2008. Loddo, Saiu. How to Implement a Virtual Network Laboratory in Six Months and Be
- Happy. SIGPLAN Workshop on ML, 2007.
Kohler, Morris, Chen, Jannotti, Kaashoek. The Click Modular Router. ACM Transactions on Computer Systems 18(3), August 2000. Maheswaran, Malozemoffy, Ngy, Liaoy, Guy, Maniymarany, Raymondy, Shaikhy, Gaoy. GINI: A User-Level Toolkit for Creating Micro Internets for Teaching & Learning Computer
- Networking. SIGCSE Bulletin. 2009.
Pfaff, Pettit, Koponen, Amidon, Casado, Shenker. Extending Networking into the Virtualization Layer. HotNets 2009.
- Davoli. VDE: Virtual Distributed Ethernet. T
echnical Report. University of Bologna. 2004. Bavier, Feamster, Huang, Peterson, Rexford. In VINI Veritas: Realistic and Controlled Network Experimentation. SIGCOMM 2006.