memory-intensive VMs in a data center Kasidit Chanchio Vasabilab - - PowerPoint PPT Presentation

memory intensive vms in a data center
SMART_READER_LITE
LIVE PREVIEW

memory-intensive VMs in a data center Kasidit Chanchio Vasabilab - - PowerPoint PPT Presentation

ALICE O2 Presentation Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science, Faculty of Science and Technology, Thammasat University


slide-1
SLIDE 1

Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center

Kasidit Chanchio Vasabilab Dept of Computer Science, Faculty of Science and Technology, Thammasat University http://vasabilab.cs.tu.ac.th

ALICE O2 Presentation

slide-2
SLIDE 2

Outline

  • Introduction and problems
  • Checkpointing mechanisms
  • Our Proposal

– Time-bound Live Checkpointing (TLC) – A Scalable Checkpointing Technique

  • Conclusion and Future Works
slide-3
SLIDE 3

Introduction

  • Today, applications require more CPUs and RAM

– Big Data Analysis – Large Scale simulation – Scientific Computation – Legacy Applications, etc.

  • Cloud computing has become a common

platform for large-scale computations

– Amazon offers VM with 8 vcpus and 68.4GiB Ram – Google offers VM with 8 vcpus and 52GB Ram

  • Large-scale applications can have long exe time

– In case of failures, users must restart apps from beginning

slide-4
SLIDE 4

How do we handle server crashes?

  • Checkpointing: The state of long running apps should

be saved regularly so that the computation can be recovered from the last saved state if failures occur

  • It usually take a long time to save state of CPU and

memory-intensive apps

– Downtime could also be high

  • Parallel File System (PFS) can

be a bottleneck and slowdown the entire system when saving state of multiple nodes simultaneously

From

slide-5
SLIDE 5

What is Checkpointing?

  • Periodically Save Computation State to

Persistent Storage for recovery if failures occur

Linux/Hardware VM-Level OS-Level User Level Application-Level Modify App Link with Chkpt library Modify Kernel Modify Hypervisor More works on development Know exactly what to save Depend on exe environments Don’t have to recompile app Depend on Kernel version Can reuse executable Must handle all VM state Transparent to Guest OS/App

slide-6
SLIDE 6

VM Checkpointing

  • Highly Transparent to Guest OS & Applications
  • Save all apps and execution environments
  • Techniques:

– Stop & Save [kvm] – Copy on Write & Chkpt Thread [vmware ESXi] – Copy to Memory Buffer [TLC 2009] – Live replication to a backup host [Remus] – Time-bound Live Checkpointing [TLC]

slide-7
SLIDE 7
  • 1. Stop and Save

VM Hypervisor Local or Shared Storage

  • Stop the VM to save state

to disk

  • Long Downtime and

Checkpoint time

  • Saving to shared storage

is necessary if want to restore on a new host

  • Saving to shared storage

cause higher checkpoint time

slide-8
SLIDE 8
  • 2. Copy on Write

Hypervisor VM Local or Shared Storage

  • Hypervisor create a thread

to scan memory and save unmodified pages

  • If VM modifies a page,

hypervisor copy the original contents of that page to directly to disk

  • Can cause high downtime if

large number of pages are modified in a short period

  • f time

One memory scan

slide-9
SLIDE 9
  • 3. Memory Buffer

Hypervisor VM Memory Local/ Shared Storage

  • Hypervisor create a

thread to scan memory and save unmodified pages

  • Hypervisor stop VM to

copy dirty pages to a memory buffer and write the buffer to disk later when checkpointing done

  • Need large amount of

memory

One memory scan

slide-10
SLIDE 10
  • 4. Replication

Hypervisor VM Source Host Backup Host Memory Local/ Shared Storage

  • Hypervisor stop VM

periodically to copy and sync state information with a backup host

  • Great for High

Availability

  • Need to reserve

resource on a backup host for the VM throughout its lifetime

slide-11
SLIDE 11

Time-bound Live Migration

  • TLC is based on the Time-bound, Thread-based

Live Migration (TLM) [CCgrid 2014]

  • Basic Principles of TLM:

– TLM finishes within a bounded period of time,i.e.,

  • ne round of memory scan

– Performs with best efforts to minimize downtime – Dynamically adjust VM computation speed to reduce downtime by balancing dirty page generation rate and available data transfer bandwidth

slide-12
SLIDE 12

TLM Design

VM State Transfer

  • Add two threads to

source hypervisor

– Mtx: scan entire ram – Dtx: new dirty pages

  • Use two receiver

threads to dest Optimization

  • Manage Resource

Allocation and handle downtime minimization VM State Transfer Downtime reduction

slide-13
SLIDE 13

Kvm Migration and Downtime (over a 10 Gbps network)

kvm-1.x-<tolerable downtime>

  • 1. Hard to find right

tolerable downtime

  • 2. Same param may cause very different

migration behaviors

TLM

slide-14
SLIDE 14

TLM:Kernel MG Class D

  • 36GB VM Ram, 27.3GB WSS
  • Low locality, 600,000 pages

can be updated in one second but pages are transfer no more than 100,000 page/sec

  • Reasonable Bandwidth

(1) (2) (3) 1 Gbps network

slide-15
SLIDE 15

Time-bound Live Checkpointing (TLC)

  • Based on TLM
  • Send state evenly to

set of Distributed Memory Servers

  • Let each DMS saves

the state to local disk when finish Stage 3

  • Each DMS can write

state to PFS later

  • Perf: migtime + 1/3
  • f saving the entire

VM state to local disk

slide-16
SLIDE 16

Time-bound Live Checkpointing (TLC)

  • Based on TLM
  • Each DMS load state

info from local disk

  • When the loading is

done, send data simultaneously to the restored VM

  • The restored VM put the

transmitted state info at the right place and resume computation

  • Perf: 1/3 of traditional

VM restoration time

slide-17
SLIDE 17

How do we make TLC checkpointing scale?

  • Define a set of host, namely a circle
  • Let each host in the same circle takes turn to

checkpoint while the rests help saving its state

slide-18
SLIDE 18

Scalable Checkpointing

  • Put each host in a circle into a separate group
slide-19
SLIDE 19

Scalable Checkpointing

  • VM on host in the same group chkpt at the same time

VMs in the same group could be communicating with one another

slide-20
SLIDE 20

Scalable Checkpointing

  • VM on host in the same group chkpt at the same time
slide-21
SLIDE 21

Scalable Checkpointing

  • Every DMS on a helping host save state to local disk
slide-22
SLIDE 22

Scalable Checkpointing

  • DMS can later saves state to PFS
slide-23
SLIDE 23

Scalable Checkpointing

  • Or, DMS can collaborate to replicate state information
slide-24
SLIDE 24

Conclusion and Future Works

  • We propose a Time-bound Live Checkpointing (TLC)

mechanism

– Finish in a bound time period (proportional to Ram size) – Provide best effort downtime minimization – Reduce dirty page generation rate to minimize downtime

  • We propose using a set of the Distributed Memory

Server to speed up checkpointing time

  • We propose a method to perform checkpointing at a

large scale

  • We have implemented TLC and DMS and conducted

preliminary experiments

  • Next, we will evaluate the scalable checkpointing ideas
  • Thank you. Questions?
slide-25
SLIDE 25

BACKUP

slide-26
SLIDE 26

Experimental Setup

  • Each VM uses 8 vcpu
  • NAS Parallel Benchmark v3.3

– OpenMP Class D (and MPI Class D in paper)

  • VM migrate from source to

dest computer

  • Two separate networks:

– 10 Gbps for migration – 1 Gbps for iperf

  • Iperf fires from supporting

computer

  • VM disk image of migrating

VM is on NFS

slide-27
SLIDE 27

TLM Performance: Kernel IS Class D

  • 36GB VM Ram, 34.1GB WSS
  • Update large amount of pages continuously
  • VM page transfer rate is about half of dirty page generation
  • The migration tome of TLM and TLM.1S are close
  • TLM downtime is about 0.68 of that of TLM.1S