Virtualizzazione Corso di Sistemi Distribuiti e Cloud Computing - - PDF document

virtualizzazione
SMART_READER_LITE
LIVE PREVIEW

Virtualizzazione Corso di Sistemi Distribuiti e Cloud Computing - - PDF document

Universit degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Virtualizzazione Corso di Sistemi Distribuiti e Cloud Computing A.A. 2018/19 Valeria Cardellini References Virtual machines


slide-1
SLIDE 1

Virtualizzazione

Università degli Studi di Roma “Tor Vergata” Dipartimento di Ingegneria Civile e Ingegneria Informatica

Corso di Sistemi Distribuiti e Cloud Computing A.A. 2018/19 Valeria Cardellini

References

  • “Virtual machines and virtualization of clusters and data

centers”, chapter 3 of Distributed and Cloud Computing

http://bit.ly/2xBa2xg

  • “Virtualization”, chapter 3 of Mastering Cloud Computing
  • J.E. Smith, R. Nair, The architecture of virtual machines,

IEEE Computer, 2005. http://bit.ly/2z5cW0X

  • D. Bernstein, Containers and Cloud: From LXC to Docker to

Kubernetes, IEEE Cloud Computing, 2014. http://bit.ly/2hqudbf

  • More papers on the course web site

Valeria Cardellini - SDCC 2018/19 1

slide-2
SLIDE 2

Valeria Cardellini - SDCC 2018/19

Virtualizzazione

  • Livello alto di astrazione che nasconde i dettagli

dell’implementazione sottostante

  • Astrazione di risorse computazionali

– Si presenta all’utilizzatore una vista logica diversa da quella fisica

  • Gli obiettivi della virtualizzazione possono essere diversi:

– Affidabilità, prestazioni, sicurezza, …

  • Come? Disaccoppiando l’architettura ed il comportamento

delle risorse hw e sw percepiti dall’utente dalla loro realizzazione fisica

2

Virtualizzazione di risorse

  • Virtualizzazione di risorse (hw e sw) di sistema

– Macchina virtuale, container, …

  • Virtualizzazione di storage

– Storage Area Network (SAN), …

  • Virtualizzazione di rete

– Virtual LAN (VLAN), Virtual Private Network (VPN), …

  • Virtualizzazione di data center

Valeria Cardellini - SDCC 2018/19 3

slide-3
SLIDE 3

Components of virtualized environment

  • Three major components:

– Guest – Host – Virtualization layer

  • Guest: system

component that interacts with the virtualization layer rather than with the host

  • Host: original

environment where the guest is supposed to be managed

Valeria Cardellini - SDCC 2018/19 4

  • Virtualization layer: responsible for recreating the same
  • r a different environment where the guest will operate

Taxonomy of virtualization techniques

  • Execution environment virtualization is the oldest,

most popular and most developed area we will mostly investigate it

Valeria Cardellini - SDCC 2018/19 5

slide-4
SLIDE 4

Valeria Cardellini - SDCC 2018/19

Macchina virtuale

  • Una macchina virtuale (VM, Virtual Machine)

permette di rappresentare le risorse hw/sw di una macchina diversamente dalla loro realtà

– Ad es. le risorse hw della macchina virtuale (CPU, scheda di rete, …) possono essere diverse dalle risorse fisiche della macchina reale

  • Una singola macchina fisica può essere

rappresentata e usata come differenti ambienti di elaborazione

– Molteplici VM su una singola macchina fisica

6

Virtualization layer VM1 VM2 VM3 Hardware

Valeria Cardellini - SDCC 2018/19

Virtualizzazione: cenni storici

  • Il concetto di VM è un’idea “vecchia”, essendo

stato definito negli anni ’60 in un contesto centralizzato

– Ideato per consentire al software legacy (esistente) di essere eseguito su mainframe molto costosi e condividere in modo trasparente le (scarse) risorse fisiche – Ad es. il mainframe IBM System/360-67

  • Negli anni ’80, con il passaggio ai PC il problema

della condivisione trasparente delle risorse di calcolo viene risolto dai SO multitasking

– L’interesse per la virtualizzazione svanisce

7

slide-5
SLIDE 5

Valeria Cardellini - SDCC 2018/19

Virtualizzazione: cenni storici (2)

  • Alla fine degli anni '90, l’interesse rinasce per rendere

meno onerosa la programmazione hw special- purpose

– VMware viene fondata nel 1998

  • Si acuisce il problema dei costi di gestione e di

sottoutilizzo di piattaforme hw e sw eterogenee

– L’hw cambia più velocemente del sw (middleware e applicazioni) – Aumenta il costo di gestione e diminuisce la portabilità

  • Diventa nuovamente importante la condivisione

dell’hw e delle capacità di calcolo non usate per ridurre i costi dell’infrastruttura

  • E’ una delle tecnologie abilitanti del Cloud computing

8

Virtualizzazione: vantaggi

  • Facilita la compatibilità, portabilità, interoperabilità e

migrazione di applicazioni ed ambienti

– Indipendenza dall’hw – Create Once, Run Anywhere – VM legacy: eseguire vecchi SO o vecchie applicazioni su nuove piattaforme

Valeria Cardellini - SDCC 2018/19 9

slide-6
SLIDE 6

Virtualizzazione: vantaggi (2)

  • Permette il consolidamento dei server in un data center,

con vantaggi economici, gestionali ed energetici

– Multiplexing di molteplici VM sullo stesso server – Obiettivo: ridurre il numero totale di server usati, utilizzandoli in modo più efficiente – Vantaggi:

  • Riduzione dei costi e dei consumi energetici
  • Semplificazione nella gestione, manutenzione ed upgrade dei

server

  • Riduzione dello spazio occupato e dei tempi di downtime

Valeria Cardellini - SDCC 2018/19 10

Virtualizzazione: vantaggi (3)

  • Permette di isolare componenti malfunzionanti o

soggetti ad attacchi di sicurezza, incrementando l’affidabilità e la sicurezza delle applicazioni

– Macchine virtuali di differenti applicazioni non possono avere accesso alle rispettive risorse – Bug del software, crash, virus in una VM non possono danneggiare altre VM in esecuzione sulla stessa macchina fisica

  • Permette di isolare le prestazioni

– Ad es. tramite opportuno scheduling delle risorse fisiche che sono condivise tra molteplici VM

  • Permette di bilanciare il carico sui server

– Tramite la migrazione della VM da un server ad un altro

Valeria Cardellini - SDCC 2018/19 11

slide-7
SLIDE 7

Valeria Cardellini - SDCC 2018/19

Uso di ambienti di esecuzione virtualizzati

  • In ambito personale e didattico

– Per eseguire diversi SO simultaneamente – Per semplificare l’installazione di sw

  • In ambito professionale

– Per debugging, testing e sviluppo di applicazioni

  • In ambito aziendale

– Per consolidare l’infrastruttura del data center – Per garantire business continuity: incapsulando interi sistemi in singoli file (system image) che possono essere replicati, migrati o reinstallati su qualsiasi server

12 Valeria Cardellini - SDCC 2018/19

Architetture delle macchine virtuali

A che livello realizzare la virtualizzazione?

  • Dipende fortemente dalle interfacce offerte dai vari

componenti del sistema

– Interfaccia tra hw e sw (user ISA: istruzioni macchina non privilegiate, invocabili da ogni programma) [interfaccia 4] – Interfaccia tra hw e sw (system ISA: istruzioni macchina invocabili solo da programmi privilegiati) [interfaccia 3] – Chiamate di sistema [interfaccia 2]

  • ABI (Application Binary Interface):

interfaccia 2 + interfaccia 4

– Chiamate di libreria (API) [interfaccia 1]

  • Obiettivo della virtualizzazione

– Imitare il comportamento di queste interfacce

Riferimento: “The architecture of virtual machines”

13

slide-8
SLIDE 8

An application uses library functions (A1), makes system calls (A2), and executes machine instructions (A3)

Hardware Operating System ISA Libraries ABI API

System calls

Applications

System ISA User ISA A1 A2 A3

Machine reference model

Valeria Cardellini - SDCC 2018/19 14 Valeria Cardellini - SDCC 2018/19

Levels of abstraction for virtualization

  • Five levels of virtualization:

– ISA level

  • The ISA emulation requires binary translation and its
  • ptimization, e.g., dynamic binary translation

– Hardware level (also system VMs)

  • Based on virtual machine monitor (VMM), also called

hypervisor

  • VMM: software that securely partitions the resources of a

computer system into one or more VMs

– Operating system level (also containers) – Run-time library level – User application level (also process VMs)

15

Our focus Our focus

slide-9
SLIDE 9

Valeria Cardellini - SDCC 2018/19

Levels of abstraction for virtualization (2)

16

  • Relative merits of virtualization at the different levels

Valeria Cardellini - SDCC 2018/19

Macchina virtuale di processo (process VM)

  • Virtualizzazione per un singolo processo

– VM di processo: piattaforma virtuale che esegue un singolo processo – Fornisce un ambiente ABI o API virtuale per le applicazioni utente

  • Il programma è compilato in un codice intermedio

(portabile), che viene successivamente eseguito nel sistema runtime

  • Esempi: JVM, .NET CLR

Istanze multiple di combinazioni <applicazione, sistema runtime>

17

slide-10
SLIDE 10

Valeria Cardellini - SDCC 2018/19

Monitor della macchina virtuale (VMM)

  • Strato sw separato che scherma (completamente)

l’hw sottostante ed imita l’insieme di istruzioni dell’architettura

  • Sul VMM possono essere eseguiti

indipendentemente e simultaneamente sistemi

  • perativi diversi
  • Esempi: VMware, KVM, Xen, Parallels, VirtualBox

Istanze multiple di combinazioni <applicazioni, sistema operativo>

18

Termini VMM

  • Consideriamo la virtualizzazione a livello di

sistema (VMM o hypervisor)

  • Host: piattaforma di base sulla quale si realizzano

le VM; comprende:

– Macchina fisica – Eventuale sistema operativo nativo – VMM

  • Guest: tutto ciò che riguarda ogni singola VM

– Sistema operativo ed applicazioni eseguite nella VM

Valeria Cardellini - SDCC 2018/19 19

slide-11
SLIDE 11

Classificazione VMM

  • Distinguiamo:

– VMM di sistema – VMM ospitato – Virtualizzazione completa – Paravirtualizzazione

Valeria Cardellini - SDCC 2018/19 20 Valeria Cardellini - SDCC 2018/19

VMM di sistema o VMM ospitato

VMM di sistema VMM ospitato host guest host

In quale livello dell’architettura di sistema si colloca il VMM?

– Direttamente sull’hardware (VMM di sistema) – Come applicazione su un SO esistente (VMM ospitato)

guest

21

slide-12
SLIDE 12

VMM di sistema o VMM ospitato (2)

  • VMM di sistema (type-1): eseguito direttamente sull’hw,
  • ffre funzionalità di virtualizzazione integrate in un SO

semplificato

– L’hypervisor può avere un’architettura a micro-kernel (solo funzioni di base, no device driver) o monolitica – Esempi: Xen, KVM, VMware ESX, Hyper-V

  • VMM ospitato (type-2): eseguito sul SO host, accede

alle risorse hw tramite le chiamate di sistema del SO host

– Interagisce con il SO host tramite l’ABI ed emula l’ISA di hw virtuale per i SO guest – Vantaggio: può usare il SO host per gestire le periferiche ed utilizzare servizi di basso livello (es. scheduling delle risorse) – Vantaggio: non occorre modificare il SO guest – Svantaggio: degrado delle prestazioni rispetto a VMM di sistema – Esempi: Bochs, Parallels Desktop, VirtualBox

Valeria Cardellini - SDCC 2018/19 22 Valeria Cardellini - SDCC 2018/19

Virtualizzazione completa o paravirtualizzazione

Quale modalità di dialogo tra la VM ed il VMM per l’accesso alle risorse fisiche? – Virtualizzazione completa (full) – Paravirtualizzazione

  • Virtualizzazione completa

– Il VMM espone ad ogni VM interfacce hw simulate funzionalmente identiche a quelle della sottostante macchina fisica – Il VMM intercetta le richieste di accesso privilegiato all’hw (ad es. istruzioni di I/O) e ne emula il comportamento atteso – Il VMM gestisce un contesto di CPU per ogni VM e condivide le CPU fisiche tra tutte le VM

– Esempi: KVM, VMware ESXi, Microsoft Hyper-V

23

slide-13
SLIDE 13

Valeria Cardellini - SDCC 2018/19

Virtualizzazione completa o paravirtualizzazione

Quale modalità di dialogo tra la VM ed il VMM per l’accesso alle risorse fisiche?

  • Paravirtualizzazione

– Il VMM espone ad ogni VM interfacce hw simulate funzionalmente simili (ma non identiche) a quelle della sottostante macchina fisica – Non viene emulato l’hw, ma viene creato uno strato minimale di sw (Virtual Hardware API) per assicurare la gestione delle singole istanze di VM ed il loro isolamento – Esempi: Xen, Oracle VM (basato su Xen), PikeOS

Confronto qualitativo delle diverse soluzioni per VM

en.wikipedia.org/wiki/Comparison_of_platform_virtual_machines

24

Virtualizzazione completa

  • Vantaggi

– Non occorre modificare il SO guest – Isolamento completo tra le istanze di VM: sicurezza, facilità di emulare diverse architetture

  • Svantaggi

– VMM più complesso – Necessaria la collaborazione del processore per implementazione efficace: perché?

Valeria Cardellini - SDCC 2018/19 25

slide-14
SLIDE 14

Problemi per realizzare la virtualizzazione di sistema

  • L’architettura del processore opera

secondo almeno 2 livelli (ring) di protezione: supervisor e user

– Ring 0: privilegi massimi – Ring 3: privilegi minimi

Valeria Cardellini - SDCC 2018/19 26

Architettura x86 senza virtualizzazione

  • Con la virtualizzazione:

– Il VMM opera in supervisor mode (ring 0) – ll SO guest e le applicazioni (quindi la VM) operano in user mode (ring 1 o ring 3) – Problema del ring deprivileging: il SO guest opera in un ring che non gli è proprio è non può eseguire istruzioni privilegiate – Problema del ring compression: poiché applicazioni e SO guest eseguono allo stesso livello, occorre proteggere lo spazio del SO

Virtualizzazione completa (2)

  • Come risolvere il ring deprivileging?

– Trap-and-emulate: quando il SO guest tenta di eseguire un’istruzione privilegiata (e.g., lidt in x86 ovvero load interrupt descriptor table), il processore notifica un’eccezione (trap) al VMM e gli trasferisce il controllo; il VMM controlla la correttezza dell’operazione richiesta e ne emula il comportamento – Le istruzioni non privilegiate eseguite dal SO guest sono invece eseguite direttamente

  • Come realizzare il meccanismo di trap?

– Se il processore fornisce supporto alla virtualizzazione: a livello hardware è hardware-assisted CPU virtualization – Se il processore non fornisce supporto alla virtualizzazione: a livello software è fast binary translation

Valeria Cardellini - SDCC 2018/19 27

slide-15
SLIDE 15

Hardware-assisted CPU virtualization

  • Hardware-assisted CPU virtualization (Intel VT-x and

AMD-V) provides two new forms of CPU operating modes, called root mode and non-root mode, each supporting all four x86 protection rings

Valeria Cardellini - SDCC 2018/19

  • VMM runs in root mode

(Root-Ring 0), while all guest OSes run in guest mode in their original privilege levels (Non-Root Ring 0): no longer ring deprivileging and ring compression problems

  • VMM can control guest

execution through control bits of hardware defined structures

28

X86 architecture with full virtualization and hardware-assisted CPU virtualization

Fast binary translation

  • Ma il meccanismo di trap al VMM per le istruzioni privilegiate

è offerto solo dai processori con supporto hardware per la virtualizzazione (Intel VT-x e AMD-V)

– IA-32 non lo è: come realizzare la virtualizzazione completa in mancanza del supporto hw?

  • Fast binary translation: il VMM scansiona il codice prima

della sua esecuzione per sostituire blocchi contenenti istruzioni privilegiate con blocchi funzionalmente equivalenti e contenenti istruzioni per la notifica di eccezioni al VMM

Valeria Cardellini - SDCC 2018/19

Architettura x86 con virtualizzazione completa e binary translation

  • I blocchi tradotti sono

eseguiti direttamente sull’hw e conservati in una cache per eventuali riusi futuri

  • Maggiore complessità del

VMM e minori prestazioni

29

slide-16
SLIDE 16

Paravirtualization

Valeria Cardellini - SDCC 2018/19

  • Non-transparent virtualization solution
  • The guest OS kernel must be modified to let it invoke the

special API exposed by the virtualization layer

  • Nonvirtualizable OS instructions are replaced by

hypercalls that communicate directly with the hypervisor

  • A hypercall is to a hypervisor what a syscall is to a kernel

X86 architecture with paravirtualization

30

Paravirtualization (2)

Valeria Cardellini - SDCC 2018/19

  • Pros (vs full virtualization):

– Overhead reduction – Relatively easier and more practical implementation: the VMM simply transfers the execution of performance-critical

  • perations (hard to virtualize) to the host
  • Cons:

– Requires the source code of OSes to be available

  • OSes that cannot be ported (e.g., Windows) can still take

advantage of virtualization by using ad hoc device drivers that remap the execution of critical instructions to the virtual API exposed by the VMM

– Cost of maintaining paravirtualized OSes

31

slide-17
SLIDE 17

Paravirtualization: hypercall execution

Valeria Cardellini - SDCC 2018/19

  • The hypervisor (not the kernel) has interrupt handlers installed
  • When a VM application issues a guest OS system call, execution

jumps to the hypervisor to handle, which then passes control back to the guest OS

Courtesy of “The Definitive Guide to XEN hypervisor” by D. Chisnall

32

Summing up the different approaches

Valeria Cardellini - SDCC 2018/19 33

slide-18
SLIDE 18

VMM reference architecture

  • Dispatcher: VMM entry point that reroutes the

instructions issued by the VM

  • Allocator (or scheduler): decides about the system

resources to be provided to the VM

  • Interpreter: executes a proper routine when the VM

executes a privileged instruction

34 Valeria Cardellini - SDCC 2018/19

VMM reference architecture: scheduler

Valeria Cardellini - SDCC 2018/19 35

  • Another scheduling layer with respect to the classic

CPU scheduling: VMM scheduler

  • How to efficiently schedule virtual CPUs?
  • We will study scheduling in Xen
slide-19
SLIDE 19

Memory virtualization

  • In a non-virtualized environment

– One-level mapping: from virtual memory to physical memory provided by page tables – MMU and TLB hardware components to optimize virtual memory performance

  • In a virtualized environment

– All VMs share the same machine memory and VMM partitions memory among VMs – Two-level mapping: from virtual memory to physical memory and from physical memory to machine memory

  • Terminology

– Host physical memory: actual hw memory visible to VMM – Guest physical memory: memory visible to guest OS – Guest virtual memory: memory visible to applications; continuous virtual address space presented by guest OS to applications

Valeria Cardellini - SDCC 2018/19 36

Two-level memory mapping

Valeria Cardellini - SDCC 2018/19

  • Going from guest virtual memory to host physical memory requires a

two-level memory mapping:

  • Guest VA (virtual address) è guest PA (physical address) è host

MA (machine address)

  • Guest physical address ≠ host machine address

37

slide-20
SLIDE 20

Shadow page table

  • To avoid an unbearable performance drop due to the

extra memory mapping, VMM maintains shadow page tables (SPTs)

– Direct guest virtual-to-host physical address mapping

Valeria Cardellini - SDCC 2018/19 38

  • SPT maps guest virtual address to host

physical address

– Guest OS maintains its own virtual memory page table (PT) in the guest physical memory frames – For each guest physical memory frame, VMM should map it to host physical memory frame – SPT maintains the mapping from guest virtual address to host machine address – VMM needs to keep the SPTs consistent with changes made by each guest OS to its PT

Challenges in memory virtualization with SPT

Valeria Cardellini - SDCC 2018/19

  • Address translation

– Guest OS expects contiguous, zero-based physical memory, but the underlying machine memory may be discontiguous: VMM must preserve this illusion

  • Page table shadowing

– SPT implementation is complex – VMM intercepts paging operations and constructs copy of PTs

  • Overheads

– VM exits add to execution time – SPTs consume significant host memory – SPTs need to be kept synchronized with guest PTs

39

slide-21
SLIDE 21

Hw support for memory virtualization

  • SPT is a software-managed solution; let us also

consider an hardware solution

– Second Level Address Translation (SLAT) is the hardware- assisted solution for memory virtualization (Intel EPT and AMD RVI) to translate the guest virtual address into the machine’s physical address

Valeria Cardellini - SDCC 2018/19

– Using SLAT significant performance gain with respect to SPT: around 50% for MMU intensive benchmarks

40 Valeria Cardellini - SDCC 2018/19

Case study: Xen

  • The most notable example of paravirtualization

www.xenproject.org (developed at University of Cambridge) – Open-source type-1 (system VMM) hypervisor with micro- kernel design – Offers to the guest OS a virtual interface (hypercall API) to whom the guest OS must refer to access the machine physical resources – With paravirtualization (PV) Xen requires PV-enabled guest OSes and PV drivers (now part of the Linux kernel as well as

  • ther operating systems)
  • OSes ported to Xen: Linux, NetBSD, FreeBSD and OpenSolaris

– Can also support hardware-assisted virtualization (HVM)

  • With HVM unmodified guest OSes (e.g., Windows) can be used

– Foundation for many products and platform (e.g., Oracle VM and Qubes OS) and powers some of the largest IaaS providers (e.g., Amazon, Rackspace)

  • However Amazon has recently begun a shift to KVM

41

slide-22
SLIDE 22

Valeria Cardellini - SDCC 2018/19

Xen: pros and cons

  • Pros

– Thin hypervisor model

  • 300K lines of code on x86, 65K on Arm
  • Small footprint and interface (around 1MB in size)
  • More robust and secure than other hypervisors
  • But still vulnerable to attacks https://xenbits.xen.org/xsa/

– Continuously improved – Flexibility in management

  • Tuning for performance

– Minimal overhead (within 2.5%) with respect to the bare metal machine without virtualization – Supports migration

  • Cons
  • I/O performance still remains challenging

42 Valeria Cardellini - SDCC 2018/19

Xen architecture

  • Goal of the Cambridge group (late 1990s)
  • Design a VMM capable of scaling to about 100 VMs running

applications and services without any modifications to ABI

  • First public release in 2003
  • Micro-kernel design
  • What can be paravirtualized?
  • Disk and network devices
  • Emulated platform: motherboard, device buses, BIOS, legacy

boot

  • Privileged instructions and page tables (memory access)
  • Privileged instructions issued by a guest OS are replaced with

hypercalls

43

slide-23
SLIDE 23

Valeria Cardellini - SDCC 2018/19

Xen architecture: domains

  • Xen domain
  • Represents VM instance
  • Ensemble of address spaces hosting a guest OS and

applications running over the guest OS

  • Runs on a virtual CPU
  • Dom0 (or control domain): specialized domain devoted

to execution of Xen control functions and privileged instructions

  • Initial domain started by Xen hypervisor on boot
  • Special privileges: capability to access HW directly, access to

the system’s I/O functions and interaction with the other VMs

  • DomU (or unprivileged domain): user domain

44

Xen architecture

https://wiki.xen.org/wiki/Xen_Project_Software_Overview

Valeria Cardellini - SDCC 2018/19 45

slide-24
SLIDE 24

Xen architecture and guest OS management

  • Xen hypervisor runs in the most privileged mode and

controls the access of guest OS to underlying hw

– Domains are run in ring 1 – Applications in ring 3

Valeria Cardellini - SDCC 2018/19 46

Dom0 components: XenStore and Toolstack

  • XenStore: information storage space shared between

domains managed by xenstored daemon

– System-wide registry and naming service – Implemented as a hierarchical key-value storage – When values are changed, a watch function informs listeners of changes of the key in storage they have subscribed to – Communicates with guest VMs via shared memory using Dom0 privileges

  • Toolstack: to manage VM lifecycle (create, shutdown,

pause, migrate)

– To create a new VM, a user provides a configuration file describing memory and CPU allocations and device configurations – Toolstack parses this file and writes this information in XenStore – Takes advantage of Dom0 privileges to map guest memory, to load kernel and virtual BIOS and to set up initial communication channels with XenStore and with virtual console when a new VM is created

Valeria Cardellini - SDCC 2018/19 47

slide-25
SLIDE 25

CPU schedulers in Xen

  • The job of an hypervisor's scheduler is to decide, among

all the virtual CPUs (vCPUs) of the various VMs, which

  • nes should execute on the host's physical CPUs

(pCPUs), at any given point in time

  • Further scheduling level with respect to those provided by OS

(scheduling of processes and scheduling of user-level threads within processes)

  • Xen allows to choose among different CPU schedulers

– We consider the Credit scheduler (default scheduler in Xen)

  • Scheduling algorithm goals:

– Make sure that domains get “fair” share of CPU

  • Proportional share algorithm: allocates pCPU in proportion to the

number of shares (weights) assigned to vCPUs

– Keep the CPU busy

  • Work-conserving algorithm: does not allow the CPU to be idle when

there is work to be done

– Schedule with low latency

Valeria Cardellini - SDCC 2018/19 48

Credit scheduler

  • Proportional fair share and work-conserving scheduler
  • Each domain is assigned a weight and optionally a cap

(tunable parameters)

– Weight: relative CPU allocation per domain (default 256) – Cap: maximum amount of CPU a domain can use. If cap is 0 (default), then vCPU can receive any extra CPU (i.e., work-conserving); non-zero cap limits the amount of CPU a vCPU receives (e.g., 100 = 1 pCPU, 50 = 0.5 pCPU) – The scheduler transforms the weight into a credit allocation for each vCPU; as a vCPU runs, it consumes credits

  • For each pCPU, the scheduler maintains a queue of vCPUs,

with all the under-credit vCPUs first, followed by the over- credit vCPUs; the scheduler picks the first vCPU in the queue

  • Automatically load balances vCPUs across pCPUs on SMP

host

– Before a pCPU goes idle, it will consider other pCPUs in order to find any runnable vCPU; this approach guarantees that no pCPU idles when there is runnable work in the system

Valeria Cardellini - SDCC 2018/19

wiki.xen.org/wiki/Credit_Scheduler

49

slide-26
SLIDE 26

Performance comparison of hypervisors

  • Developments in virtualization techniques and CPU

architectures have reduced the performance cost of virtualization but overheads still exist

– Especially when multiple VMs compete for hw resources

  • We consider two performance comparison studies

– See papers on the course site

  • Take-home message

– No one size fits all solution exists – Different hypervisors show different performance characteristics for varying workloads

Valeria Cardellini - SDCC 2018/19 50

Performance comparison of hypervisors (2)

A component-based performance comparison of four hypervisors (IM 2013) http://bit.ly/2igBGZX

– Microsoft Hyper-V, KVM, VMware vSphere and Xen, all with hardware-assisted virtualization settings – Analyzed components: CPU, memory, disk I/O and network I/O

  • Overall results

– Performance can vary between 3% and 140% depending on the type of hw resource, but no single hypervisor always

  • utperforms the others
  • vSphere performs the best, but the other 3 perform respectably
  • CPU and memory: lowest levels of overhead
  • I/O and network: Xen overhead for small disk operations
  • Takeaway: consider the type of applications because

different hypervisors may be best suited for different workloads

Valeria Cardellini - SDCC 2018/19 51

slide-27
SLIDE 27

Performance comparison of hypervisors (3)

Performance overhead among three hypervisors: an experimental study using Hadoop benchmarks (BigData 2013) http://bit.ly/2ziKCZM

  • Use Hadoop MapReduce apps to evaluate and

compare the performance impact of three hypervisors

– A commercial one (not disclosed), Xen, and KVM

  • For CPU-intensive benchmarks, negligible

performance difference between the three hypervisors

  • Significant performance variations were seen for I/O-

intensive benchmarks

– Commercial hypervisor better at disk writing, while KVM better for disk reading – Xen better when there was a combination of disk reading and writing with CPU-intensive computations

Valeria Cardellini - SDCC 2018/19 52

Portability of virtual machines

  • VM image: copy of the VM, which may contain an

OS, data files, and applications

  • How to import and export VM images and avoid

vendor lock-in?

  • Open Virtualization Format (OVF)

– Open industry standard for packaging and distributing VMs

  • Virtual-platform agnostic

– VM configuration specified in XML format within a file – Supported by many (but not all) virtualization products (VMware, VirtualBox, …)

Valeria Cardellini - SDCC 2018/19 53

slide-28
SLIDE 28

VM dynamic resizing and migration

  • Two useful techniques to deploy and

manage large-scale virtualized environments

– Dynamic resizing for vertical scaling (scale up, scale down) – Live migration

  • Move VM between different physical machines (or data

centers) without stopping it

Valeria Cardellini - SDCC 2018/19 54

VM dynamic resizing

  • Fine-grain mechanism with respect to migrate or reboot

a VM

– Example: application running on a VM starts consuming a lot

  • f resources and the VM starts running out of RAM and CPU

resize the VM

  • Pros: more cost-effective and faster than VM reboot
  • Cons: not supported by all virtualization products and

guest OSs

  • What can be resized without powering off and

rebooting the VM?

– Number of virtual CPUs – Memory size

Valeria Cardellini - SDCC 2018/19 55

slide-29
SLIDE 29

VM dynamic resizing: CPU

  • To add or remove virtual CPUs (without switching off the

machine)

  • In Linux-based systems support for CPU hot-plug/hot-

unplug (e.g., KVM)

– Uses information in virtual file system sysfs (processor info in /sys/devices/system/cpu) – /sys/devices/system/cpu/cpuX for cpuX (X=0, 1, 2, …) – To turn on cpu #5: echo 1 > /sys/devices/system/cpu/cpu5/online – To turn off cup #5: echo 0 > /sys/devices/system/cpu/cpu5/online

Valeria Cardellini - SDCC 2018/19 56

VM dynamic resizing: memory

  • Based on memory ballooning

– In KVM: virtio_balloon driver

Valeria Cardellini - SDCC 2018/19

  • When balloon inflates

– swap area – out-of-memory (OOM) killer

  • When balloon deflates:

more memory for the VM

– the memory size cannot exceed maxMemory

57

slide-30
SLIDE 30

Migrazione di VM

  • Vantaggi della migrazione

– Utile in cluster e data center virtuali per:

  • Consolidare l’infrastruttura
  • Avere flessibilità nel failover
  • Bilanciare il carico
  • Svantaggi e problemi

– Supporto da parte del VMM – Overhead di migrazione non trascurabile – Migrazione in ambito WAN non banale

Valeria Cardellini - SDCC 2018/19 58

Migrazione di VM

  • Approcci per migrare istanze di macchine virtuali tra

macchine fisiche:

– Stop and copy: si spegne la VM sorgente e si trasferisce l’immagine della VM sull’host di destinazione, ma il downtime può essere troppo lungo

  • L’immagine della VM può essere grande e la banda di rete limitata

– Live migration: la VM sorgente è in funzione durante la migrazione

Valeria Cardellini - SDCC 2018/19 59

slide-31
SLIDE 31

Migrazione live di VM

  • Prima di avviare la migrazione live

– Fase di setup: si seleziona l’host di destinazione (ad es. con

  • biettivo di load balancing, energy efficiency, oppure server

consolidation)

  • Cosa migrare? Memoria, storage e connessioni di

rete

  • Come? In modo trasparente alle applicazioni in

esecuzione sulla VM

– Costo della migrazione live: vi è comunque un downtime dell’applicazione

Valeria Cardellini - SDCC 2018/19 60

Migrazione live di VM: storage e rete

  • Per migrare lo storage:

– Usare storage condiviso da host sorgente e destinazione

  • SAN (Storage Area Network) o più economico NAS (Network

Attached Server) o file system distribuito

– In assenza di storage condiviso: il VMM sorgente salva tutti i dati della VM sorgente in un file di immagine, che viene trasferito sull’host di destinazione

  • Per migrare le connessioni di rete:

– La VM sorgente ha un indirizzo IP virtuale (eventualmente anche un indirizzo MAC virtuale)

  • Il VMM conosce il mapping tra IP virtuale e VM

– Se sorgente e destinazione sono sulla stessa sottorete IP, non

  • ccorre fare forwarding sulla sorgente
  • Invio di risposta ARP non richiesta da parte della destinazione per

avvisare che l’indirizzo IP è stato spostato in una nuova locazione ed aggiornare quindi le tabelle ARP

61 Valeria Cardellini - SDCC 2018/19

slide-32
SLIDE 32

Migrazione live di VM: memoria

  • Per migrare la memoria (inclusi registri della CPU):

1. Fase di pre-copy: il VMM copia in modo iterativo le pagine da VM sorgente a VM di destinazione mentre la VM sorgente è in esecuzione 2. Fase di stop-and-copy: la VM sorgente viene fermata e vengono copiate soltanto le pagine dirty

  • Tempo di downtime: da qualche msec a qualche sec, in funzione

di dimensione della memoria, tipo di app e banda di rete

3. Fasi di commitment e reactivation: la VM di destinazione carica lo stato e riprende l’esecuzione; la VM sorgente viene rimossa (ed eventualmente spento l’host sorgente)

  • Chiamato approccio pre-copy

– Lo stato della VM è copiato da sorgente a destinazione prima che l’esecuzione della VM riprenda a destinazione – E’ la soluzione standard (ad es. in KVM)

Valeria Cardellini - SDCC 2018/19 62

VM live migration: overall process

Valeria Cardellini - SDCC 2018/19 63

Source: C. Clark et al., “Live Migration of Virtual Machines”, NSDI’05.

slide-33
SLIDE 33

VM live migration: alternatives for memory

  • Pre-copy cannot migrate in a transparent manner

workloads that are CPU and/or memory intensive

  • 1. Alternative approach: post-copy

– Post-copy moves the execution to the destination host at the beginning of the migration process and then transfers the memory pages in an on-demand manner as they are requested by the VM

  • 2. Alternative approach: hybrid

– A special case of post-copy migration: post-copy preceded by a limited pre-copy stage – Idea: a subset of the most frequently accessed memory pages is transferred before the VM execution is switched to the destination, so to reduce performance degradation after the VM is resumed

  • No standard implementation of post-copy and hybrid

approaches in current hypervisors

Valeria Cardellini - SDCC 2018/19 64

Approaches for migrating memory

Valeria Cardellini - SDCC 2018/19 65

Courtesy of C.Vojtech, http://bit.ly/2h7wSWB

slide-34
SLIDE 34

Live VM migration and hypervisors

  • Live VM migration is supported by open-source and

commercial hypervisors

– E.g., KVM, Hyper-V, Xen, VirtualBox – Can be done using the virsh CLI tool! # virsh migrate --live GuestName DestinationURL!

  • Limited support for live VM migration over WAN

Valeria Cardellini - SDCC 2018/19 66

VM migration in WAN environments

  • So far we focused on VM migration within a single

data center

  • How to achieve live migration of VMs across multiple

geo-distributed data centers?

Valeria Cardellini - SDCC 2018/19 67

slide-35
SLIDE 35

VM migration in WAN environments: storage

  • Some approaches to migrate storage in a WAN

– Shared storage

  • Cons: storage access time can be too slow

– On-demand fetching

  • Transfer only some blocks on the destination and then fetch

remaining blocks from the source only when requested

  • Cons: it does not work if the source crashes

– Pre-copy/write throttling

  • Pre-copy the disk image of the VM to the destination whilst

the VM continues to run, keep track of write operations on the source (delta) and then apply the delta on the destination

  • If the write rate at the source is too fast, use write throttling to

slow down the VM so that migration can proceed

Valeria Cardellini - SDCC 2018/19 68

VM migration in WAN environments: network

  • Some approaches to migrate network connections in

a WAN

– IP tunneling

  • Set up an IP tunnel between the old IP address at the source

and the new VM IP address at the destination

  • Use the tunnel to forward all packets that arrive at the source

for the old IP address

  • Once the migration has completed and the VM can respond at

its new location, update the DNS entry with the new IP address

  • Tear down the tunnel when no connections remain that use the
  • ld IP address
  • Cons: it does not work if the source crashes

– Virtual Private Network (VPN)

  • Use MPLS-based VPN to create the abstraction of a private

network and address space shared by multiple data centers

– Software-Defined Networking

  • Change the control plane, no need to change IP address!

Valeria Cardellini - SDCC 2018/19 69

slide-36
SLIDE 36

Virtualizzazione a livello di SO

  • Finora abbiamo considerato la virtualizzazione a livello

di sistema

  • Analizziamo ora la virtualizzazione a livello di

sistema operativo (o container-based virtualization)

  • Permette di eseguire molteplici ambienti di esecuzione

tra di loro isolati all’interno di un singolo SO

– Tali ambienti sono chiamati:

  • container
  • jail
  • zone
  • virtual execution environment (VE)

Valeria Cardellini - SDCC 2018/19 70

Virtualizzazione a livello di SO (2)

  • Ogni container ha:
  • il proprio insieme di processi, file system, utenti, interfacce

di rete con indirizzi IP, tabelle di routing, regole del firewall, …

  • I container condividono il kernel dello stesso SO

(e.g., Linux)

Valeria Cardellini - SDCC 2018/19 71

slide-37
SLIDE 37

Virtualizzazione a livello di SO: meccanismi

  • Quali meccanismi sono usati a livello del kernel nei SO

Unix-like per realizzare i container?

– chroot – cgroups e namespaces (Linux)

  • chroot (change root directory)

– Comando per cambiare la directory di riferimento dei processi in esecuzione

  • cgroups (control groups)

– Meccanismo che permette di limitare, misurare ed isolare l’utilizzo delle risorse (CPU, memoria, I/O a blocchi, rete) di un insieme di processi

Valeria Cardellini - SDCC 2018/19 72

Virtualizzazione a livello di SO: meccanismi

  • namespaces

– Meccanismo che permette di isolare ciò che un insieme di processi può vedere dell'ambiente operativo (processi,porte, file, …) – 6 namespace

Valeria Cardellini - SDCC 2018/19 73

slide-38
SLIDE 38

Virtualizzazione a livello di SO: vantaggi

  • Rispetto a virtualizzazione basata su VMM

ü Degrado di prestazioni pressoché nullo

Le applicazioni invocano direttamente le chiamate di sistema, non c’è bisogno di passare per il VMM

ü Tempi minimi di startup e shutdown/cleanup

Secondi per container, minuti per VM

ü Densità elevata

Centinaia di istanze su una singola macchina fisica (PM, physical machine), ad es. con Solaris Containers fino a 8191

ü Immagine (footprint) di dimensioni minori

Non comprende il kernel del SO

ü Possibilità di condividere pagine di memoria tra molteplici container in esecuzione sulla stessa PM ü Maggiore portabilità ed interoperabilità per applicazioni cloud

L’applicazione nel container è indipendente dall’ambiente di esecuzione

Valeria Cardellini - SDCC 2018/19 74

In a nutshell: lightweight vs. heavyweight

Virtualizzazione a livello di SO: svantaggi

  • Rispetto a virtualizzazione basata su VMM

– Minore flessibilità

  • Non si possono eseguire contemporaneamente kernel di SO

differenti sulla stessa PM

  • Solo applicazioni native per il SO supportato (e.g., applicazioni

native per Linux)

Valeria Cardellini - SDCC 2018/19 75

VMM-based (type 2) vs container-based virtualization

– Minore isolamento – Maggiore rischio di vulnerabilità

  • Una singola vulnerabilità nel

kernel del SO può compromettere l’intero sistema

slide-39
SLIDE 39

OS-level virtualization: products

  • Docker

– Our case study

  • FreeBSD Jail
  • Solaris Zones/Containers
  • LXC (LinuX Containers)

– Supported by the mainline Linux kernel – For full system containers (full OS image) – LXD

  • Built on top of LXC, it is a system container manager
  • Virtuozzo
  • OpenVZ (for Linux)
  • IBM LPAR
  • rkt

– Application container engine

Valeria Cardellini - SDCC 2018/19 76

OS-level virtualization: only Linux?

  • Windows and OS X now support container-based

virtualization

– Docker for Windows: integrated with Hyper-V virtualization, networking and file system

https://www.docker.com/docker-windows

  • You can always install a VM with Linux and then

use a container-based virtualization product inside the VM

– Cons: performance loss – Cons: containerized apps must run on Linux (no OS X or Windows native applications)

Valeria Cardellini - SDCC 2018/19 77

slide-40
SLIDE 40

Containers and DevOps

  • Container-based virtualization helps in the shift to

DevOps as well as Continuous Integration and Continuous Deployment (CI/CD)

– Containers (more than VMs) allow developers to build code collaboratively through the sharing of images while simplifying deployment to different infrastructures

Valeria Cardellini - SDCC 2018/19 78

  • DevOps = Development and

Operations

– “DevOps is a development methodology with a set of practices aimed at bridging the gap between Development and Operations, emphasizing communication and collaboration, continuous integration, quality assurance and delivery with automated deployment” (Jabbari et al., 2016)

Containers, microservices, serverless

  • With containers:
  • Application and all its dependencies into a single package

that can run almost anywhere

  • Use fewer resources than VMs
  • Containers are the key enabling technology for

microservices and serverless computing

– Future cloud-native applications will consist of both microservices and functions, often wrapped as containers

  • We will study microservices and serverless

Valeria Cardellini - SDCC 2018/19 79

slide-41
SLIDE 41

Hypervisors and containers in the Cloud

  • Hypervisor-based virtualization: greater flexibility

(different OSs on same PM) and security

  • Container-based virtualization: smaller-size deployment
  • Containers and container development platforms now
  • ffered as first-class Cloud services

– Amazon EC2 Container Service (ECS) – Azure Container Service – Google Container Engine – Alauda (Container-as-a-Service solution) https://www.alauda.io – Docker Cloud

  • Some open questions

– Containers on top of VMs? – Will container engines replace hypervisors in Cloud offering? – Nested virtualization? (Possible in Azure http://bit.ly/2zjqnZ2)

Valeria Cardellini - SDCC 2018/19 80

Container dynamic resizing and migration

  • As for VMs, we can resize and migrate container
  • Dynamically resize (CPU, memory, I/O) the container

limits

– Possibly without restarting the container – Low-level solution: cgroups limits can be changed on the fly

Valeria Cardellini - SDCC 2018/19 81

slide-42
SLIDE 42

Container migration

  • What about live migration of containers?
  • As for VM migration we need to:

– Save state – Transfer state – Restore from state

  • State saving, transferring and restoring happen with tasks

frozen (migration downtime) – Use memory pre-copy or memory post-copy

  • More complicated than VM migration

Valeria Cardellini - SDCC 2018/19 82

Migration in container-based virtualization (2)

Valeria Cardellini - SDCC 2018/19 83

  • Use CRIU project and P.Haul
  • CRIU: for checkpointing/restoring in userspace
  • P.Haul: on top of CRIU, for pre-checks, memory pre-

copy and post-copy, and file system migration

slide-43
SLIDE 43

Container orchestration

  • Container orchestration: tool for container

management that allows to configure, provision, deploy, monitor, and dynamically control multi- container packaged applications

– Used by organizations adopting containers for enterprise production to integrate and manage those containers at scale

  • Examples

– Kubernetes – Docker Swarm – Marathon – Nomad – Amazon Elastic Container Service

84

Container management systems at Google

  • Application-oriented shift

“Containerization transforms the data center from being machine-oriented to being application-

  • riented”
  • Goal: to allow container technology to operate

at Google scale

– Everything at Google runs as a container – Google launches 2 billion containers per week

  • Borg -> Omega -> Kubernetes

– Borg and Omega: purely Google-internal systems – Kubernetes: open-source

85

slide-44
SLIDE 44

Kubernetes

  • Google’s open-source platform for automating

deployment, scaling, and operations of application containers across clusters of hosts, providing container-centric infrastructure

http://kubernetes.io

  • Features:

– Portable: public, private, hybrid, multi-cloud – Extensible: modular, pluggable, hookable, composable – Self-healing: auto-placement, auto-restart, auto-replication, auto-scaling of containers

  • Can run on various public or private cloud platforms

(AWS, Azure, OpenStack, or Apache Mesos), and also on bare metal machines

  • Offered as Cloud service on Google Cloud Platform

– Kubernetes management and deployment on underlying infrastructure is up to Cloud provider

Valeria Cardellini - SDCC 2018/19 86

Kubernetes pod

  • Pod: basic unit that is scheduled

in Kubernetes

– A collection of (deeply coupled) containers with shared storage/ network, and a specification for how to run the containers – Pod’s containers are bundled and scheduled together, and run in a shared context

87

  • Users organize pods using labels

– Label: arbitrary key/value pair attached to pod – E.g., role=frontend and stage=production !

slide-45
SLIDE 45

Kubernetes architecture

88

Kubernetes architecture

89

  • Organized according to the usual master-worker

architecture

  • Master: cluster’s control plane, takes global decision

about the cluster

  • Nodes: worker machines, may be VM or physical

machines

  • Kubelets: node agents (on workers), ensuring that

pods running on the node are healthy and running in the desired state

  • Cluster state backed by distributed storage system:

etcd, a highly available distributed key-value store

slide-46
SLIDE 46

New lightweight approaches to virtualization

  • Deployment strategies examined so far

Valeria Cardellini - SDCC 2018/19 90

New lightweight approaches to virtualization

  • With microservices and serverless, IoT and fog/edge

computing, increased demand for low-overhead virtualization techniques (Lightweight Virtualization, LV)

– OS-level virtualization is not enough – Most Cloud applications do not require many of the services and tools coming with common OSes (shells, editors, coreutils, and package managers) – How to have tiny one-shot VMs that run on hypervisors with great density and that self-scale their resource needs? – How to improve security?

  • Lightweight OSes and unikernels

– Basic idea: avoid OS overhead and reduce attack surface

Valeria Cardellini - SDCC 2018/19 91

slide-47
SLIDE 47

Lightweight operating systems

  • Lightweight operating systems

– Minimal, container-focused Oses, typically with a monolithic kernel architecture – Container Linux, Fedora Atomic Host, Rancher OS

  • CoreOS (now Red Hat) Container Linux

https://coreos.com/os/docs/latest/

– Smaller, more compact Linux distribution

  • Only minimal functionalities required for deploying apps inside

containers, together with built-in mechanisms for service discovery, container management and process management

– Designed for large-scale deployments, mostly targeting enterprises, with focus on automation, ease of application deployment, security, and scalability – Runs also on bare metal servers – Will be merged with Atomic Host in a single project (Fedora CoreOS)

Valeria Cardellini - SDCC 2018/19 92

Unikernels

  • Unikernel: the library OS concept http://unikernel.org

– Single-purpose, single-language virtual machine hosted

  • n a minimal environment
  • Specialized OS with minimal set of libraries which

correspond to OS constructs required for app to run, all in a single address space

– Also called library OS or, sometimes, Cloud OS – Pros:

  • Lightweight and small (minimal memory footprint)
  • Fast (no context switching)
  • Secure (reduced attack surface)

– Cons:

  • Only work in hypervisor-based virtual environments
  • Poor debugging
  • Single language runtime

See https://www.youtube.com/watch?v=oHcHTFleNtg

Valeria Cardellini - SDCC 2018/19 93

slide-48
SLIDE 48

Unikernels

  • Some unikernel products (and supported

programming language): Xen MirageOS (OCaml), OSv, LinuxKit, IncludeOS (C++), ClickOS

  • OSV http://osv.io

– Unikernel designed for the Cloud – Intended to be run on top of a hypervisor (e.g., KVM, Xen) – Achieves the isolation benefits of hypervisor-based systems, but avoids the overhead of the guest OS – Uses its own application-image system, not Docker

Valeria Cardellini - SDCC 2018/19 94

Performance of LV approaches

Some performance studies

[1] “Time provisioning evaluation of KVM, Docker and Unikernels in a cloud platform”, IEEE CCGrid 2016 [2] “My VM is lighter (and safer) than your container”, SOSP 2017

  • Performance comparison: hypervisor (KVM) vs.

lightweight virtualization (Docker and OSv)

  • Overhead introduced by containers is almost

negligible

– Fast instantiation times (at least 1 order of magnitude less than VMs) – Small per-instance memory footprints – High density on a single host

  • … but paid in terms of security

Valeria Cardellini - SDCC 2017/18 95

slide-49
SLIDE 49

Performance of LV approaches

Valeria Cardellini - SDCC 2018/19 96

  • VM boot times grow linearly with

VM size

  • Instance and OS/container startup

for 10, 20 and 30 instances

  • Includes the overhead of the overall

provisioning time caused by the cloud platform (OpenStack)

  • Difficulties in securing containers due

to the unrelenting growth of the Linux syscall API over the years From [1] From [2]

Performance of LV approaches

  • A summary of properties

– “Consolidate IoT edge computing with lightweight virtualization”, IEEE Network, 2018.

Valeria Cardellini - SDCC 2018/19 97

slide-50
SLIDE 50
  • Let us take a look at:

– Storage virtualization – Cluster virtualization

Valeria Cardellini - SDCC 2018/19 98

Storage virtualization

  • Decouple the physical organization of the storage from

its logical representation

– “Storage virtualization means that applications can use storage without any concern for where it resides, what the technical interface is, how it has been implemented, which platform it uses, and how much of it is available” (R. van der Lans)

  • Two primary types of storage virtualization

– Block level

  • Aggregate multiple network storage devices into a single

block-level substrate, present to users a logical space for data storage and handle the process of mapping it to the actual physical location

– File level

  • Decouple data access from location where files are

physically stored (e.g., distributed file system)

Valeria Cardellini - SDCC 2018/19 99

slide-51
SLIDE 51

Storage virtualization: SAN

  • Storage Area Networks (SAN): the most common

solution for block-level storage virtualization

– SAN uses a network-accessible device through a large bandwidth connection to provide storage facilities

  • Fiber Channel (FC): high-speed

network technology primarily used to connect storage

– Requires special-purpose cabling – For high performance requirements

  • Internet SCSI (iSCSI): IP-based

protocol for linking data storage facilities

– Uses existing network infrastructures – For moderate performance requirements

100 Valeria Cardellini - SDCC 2018/19

Virtual clusters

  • Virtual cluster nodes: either physical or virtual

machines

– The VMs/containers in a virtual cluster are interconnected logically by a virtual network across several physical networks

  • VMs/containers can be replicated and/or migrated on

multiple physical nodes to achieve elasticity, fault tolerance, and disaster recovery

– Also the size (number of nodes) of a virtual cluster can grow or shrink dynamically

  • How to efficiently store large number of VM images?

– VM images by hypervisors are large (typically 1-30 GB in size) – Contained-based virtualization helps in reducing the image size

Valeria Cardellini - SDCC 2018/19 101

slide-52
SLIDE 52

Sharing resources in virtual clusters

102

  • Need to run multiple frameworks on a single (physical
  • r virtual) cluster
  • How to share the cluster resources among multiple and

non homogeneous frameworks executed in VMs/ containers?

  • The classical solution:

Static partitioning

  • Is it efficient?

Valeria Cardellini - SDCC 2018/19

What we need

  • The Datacenter as a Computer idea by D.

Patterson

– Share resources to maximize their utilization – Share data among frameworks – Provide a unified API to the outside – Hide the internal complexity of the infrastructure from applications

  • The solution: a cluster-scale resource

manager that employs dynamic partitioning

Valeria Cardellini - SDCC 2018/19 103

slide-53
SLIDE 53

Apache Mesos

104 Valeria Cardellini - SDCC 2018/19

Dynamic partitioning

  • Cluster manager that provides a common resource

sharing layer over which diverse frameworks can run

  • Abstracts the entire datacenter into a single pool of

computing resources, simplifying running distributed systems at scale

  • A distributed system to run distributed systems on top
  • f it

Apache Mesos (2)

Valeria Cardellini - SDCC 2018/19 105

  • Initially designed and developed at UC Berkeley
  • Now top open-source project by Apache mesos.apache.org
  • Twitter and Airbnb as first users; now supports some of

the most popular apps (e.g., Siri, Uber, Yelp)

  • Cluster: a dynamically shared pool of resources

Dynamic partitioning Static partitioning

slide-54
SLIDE 54

Mesos in the data center

  • Where does Mesos fit as an abstraction layer

in the datacenter?

Valeria Cardellini - SDCC 2018/19 106

Apache Mesos: architecture

Valeria Cardellini - SDCC 2018/19 107

  • Master-slave

architectures

  • Slaves publish

available resources to master

  • Master sends

resource offers to frameworks

  • Master election

and service discovery via ZooKeeper

slide-55
SLIDE 55

Apache Mesos: resource offers

  • Mesos master offers resources to frameworks

– Framework selected by Dominant Resource Fairness (DRF) algorithm

Valeria Cardellini - SDCC 2018/19 108