Checkpoint-Restart for a Network of Virtual Machines Rohan Garg, - - PowerPoint PPT Presentation

checkpoint restart for a network of virtual machines
SMART_READER_LITE
LIVE PREVIEW

Checkpoint-Restart for a Network of Virtual Machines Rohan Garg, - - PowerPoint PPT Presentation

Checkpoint-Restart for a Network of Virtual Machines Rohan Garg, Komal Sodha, Zhengping Jin, Gene Cooperman College of Computer and Information Science Northeastern University, Boston Boston, Massachusetts 02115 { rohgarg, komal, jinzp, gene }


slide-1
SLIDE 1

Checkpoint-Restart for a Network of Virtual Machines

Rohan Garg, Komal Sodha, Zhengping Jin, Gene Cooperman

College of Computer and Information Science Northeastern University, Boston Boston, Massachusetts 02115 {rohgarg, komal, jinzp, gene}@ccs.neu.edu

September 24, 2013

slide-2
SLIDE 2

Outline

Motivation Related Work Design and Implementation DMTCP and Plugins Generic Checkpoint-Restart for Virtual Machines Checkpointing a network of VMs Experimental Results Conclusion

slide-3
SLIDE 3

Outline

Motivation Related Work Design and Implementation Experimental Results Conclusion

slide-4
SLIDE 4

Motivation

◮ Parallel Computations on the Cloud ◮ Not everybody uses MPI: IaaS (Infrastructure as a Service) ◮ Flexibility and maintainability

slide-5
SLIDE 5

Motivation

◮ Parallel Computations on the Cloud ◮ Not everybody uses MPI: IaaS (Infrastructure as a Service) ◮ Flexibility and maintainability

Imagine if you could...

◮ deploy complex software configuration in a secure

environment

◮ gain high reliability by running within a virtual machine that

is set to take snapshots every minute

◮ checkpoint a network of virtual machines including the state

  • f a parallel computation
slide-6
SLIDE 6

Outline

Motivation Related Work Design and Implementation Experimental Results Conclusion

slide-7
SLIDE 7

Related Work

◮ Virtual Machine checkpointing

◮ QEMU, KVM, Xen, VMware: Snapshotting ◮ Remus: High Availability on Xen-based servers ◮ VM-µCheckpoint: High frequency checkpointing on Xen ◮ Emulab: Distributed checkpointing with Xen; record-replay of

network packets

◮ BlobSeer

slide-8
SLIDE 8

Related Work

◮ Virtual Machine checkpointing

◮ QEMU, KVM, Xen, VMware: Snapshotting ◮ Remus: High Availability on Xen-based servers ◮ VM-µCheckpoint: High frequency checkpointing on Xen ◮ Emulab: Distributed checkpointing with Xen; record-replay of

network packets

◮ BlobSeer

◮ Checkpoint-restart

◮ BLCR: Kernel-space ◮ CryoPid2: Process Pods; 32-bit only ◮ CRIU: User-space; Linux containers ◮ DMTCP: User-space; distributed

slide-9
SLIDE 9

Outline

Motivation Related Work Design and Implementation DMTCP and Plugins Generic Checkpoint-Restart for Virtual Machines Checkpointing a network of VMs Experimental Results Conclusion

slide-10
SLIDE 10

DMTCP and Plugins

DMTCP:

◮ Distributed MultiThreaded Checkpointing ◮ User-space ◮ Transparent checkpointing ◮ Distributed processes ◮ Wide range of supported applications: MPI, Perl/Python,

GDB, X-windows , Matlab, R

slide-11
SLIDE 11

DMTCP and Plugins

DMTCP:

◮ Distributed MultiThreaded Checkpointing ◮ User-space ◮ Transparent checkpointing ◮ Distributed processes ◮ Wide range of supported applications: MPI, Perl/Python,

GDB, X-windows , Matlab, R DMTCP Plugins:

◮ DMTCP extensions; shared libraries ◮ Short, well-defined API ◮ Add support to handle the checkpoint-restart of specific

resources

slide-12
SLIDE 12

DMTCP Plugins: Features

Two essential features:

◮ Wrapper Fuctions:

◮ Interpose on library and system function calls ◮ Process the arguments; call the interposed function; and return

back (possibly modified) return value

◮ DMTCP Events:

◮ Notify plugin of several events: Pre-checkpoint, Post-restart,

etc.

slide-13
SLIDE 13

Generic Checkpoint-Restart for VMs: Background

Generic VM Architecture

with user space) tables (shared vCPU0 vCPUn

Guest VM (user space component) VM Shell

(peripherals, IRQ, etc.) Hardware description

Kernel Module for VM: Kernel Space Memory User Space Memory

vCPU threads Async I/O threads virtual cores vCPUs for w/ kernel space) tables (shared

slide-14
SLIDE 14

Generic Checkpoint-Restart for VMs: Background

Generic VM Architecture

with user space) tables (shared vCPU0 vCPUn

Guest VM (user space component) VM Shell

(peripherals, IRQ, etc.) Hardware description

Kernel Module for VM: Kernel Space Memory User Space Memory

vCPU threads Async I/O threads virtual cores vCPUs for w/ kernel space) tables (shared

Special Cases:

◮ Xen, VMware ESXi Server:

very thin hypervisor; bare-metal; no host OS

◮ QEMU: Software emulation;

user-space

slide-15
SLIDE 15

Generic Checkpoint-Restart for VMs: Background

◮ DMTCP:

◮ Handle user-space memory, file descriptors, sockets, etc.

% dmtcp checkpoint qemu <args−for −qemu> % dmtcp command −−checkpoint % dm tc p re s tar t ckpt−qemu−img . dmtcp

slide-16
SLIDE 16

Checkpoint-Restart for KVM: Key Ideas

◮ DMTCP KVM Plugin:

◮ Launch empty VM shell ◮ Copy the checkpoint

image (they’re just bits) from the old checkpointed VM

◮ Restore kernel VM driver

parameters

◮ Patch kernel VM driver

parameters

vCPU0 vCPUn with user space) tables (shared

Guest VM (user space component) VM Shell Kernel Module for VM: Kernel Space Memory User Space Memory

(Empty H/W description) virtual cores vCPUs for vCPU threads Async I/O threads w/ kernel space) tables (shared

slide-17
SLIDE 17

Checkpoint-Restart for KVM: Key Ideas

◮ DMTCP KVM Plugin:

◮ Launch empty VM shell ◮ Copy the checkpoint

image (they’re just bits) from the old checkpointed VM

◮ Restore kernel VM driver

parameters

◮ Patch kernel VM driver

parameters

vCPU0 vCPUn with user space) tables (shared

Guest VM (user space component) VM Shell Kernel Module for VM: Kernel Space Memory User Space Memory

(Empty H/W description) virtual cores vCPUs for vCPU threads Async I/O threads w/ kernel space) tables (shared

% dmtcp checkpoint \ −−with−p l u g i n dmtcp kvm plugin . so \ qemu −enable −kvm <args−for −qemu> % dmtcp command −−checkpoint % dm tc p re s tar t ckpt−qemu−img . dmtcp

slide-18
SLIDE 18

Challenges for checkpointing a network of VMs

slide-19
SLIDE 19

Challenges for checkpointing a network of VMs

Challenges:

◮ Synchronization between VMs ◮ Re-generating the virtual network ◮ Saving and restoring in-flight data

slide-20
SLIDE 20

Challenges for checkpointing a network of VMs: Solutions

◮ Synchronization between VMs

slide-21
SLIDE 21

Challenges for checkpointing a network of VMs: Solutions

◮ Synchronization between VMs

◮ DMTCP Co-ordinator

slide-22
SLIDE 22

Challenges for checkpointing a network of VMs: Solutions

◮ Synchronization between VMs

◮ DMTCP Co-ordinator

◮ Re-generating the virtual network ◮ Saving and restoring in-flight data

slide-23
SLIDE 23

Challenges for checkpointing a network of VMs: Solutions

◮ Synchronization between VMs

◮ DMTCP Co-ordinator

◮ Re-generating the virtual network ◮ Saving and restoring in-flight data

◮ DMTCP TUN/TAP Plugin: Heuristic: ◮ Quiesce the user-application threads ◮ Wait for a fixed time: assume all packets have arrived ◮ Write the checkpoint image (if additional packets continue to

arrive, try again)

◮ Alternative approach: broadcast a cookie

% dmtcp checkpoint \ −−with−p l u g i n dmtcp kvm plugin . so \ −−with−p l u g i n dmtcp tun plugin . so \ qemu −enable −kvm <args−for −qemu> % dmtcp command −−checkpoint % dm tc p re s tar t ckpt−qemu−img . dmtcp

slide-24
SLIDE 24

Outline

Motivation Related Work Design and Implementation Experimental Results Conclusion

slide-25
SLIDE 25

Experimental Results: Setup

◮ Network of Virtual Machines

◮ 12-node cluster (at University of Alabama, Birmingham) ◮ Each node: 12-core Intel Xeon (1.6 GHz) server; 24 GB RAM ◮ KVM/QEMU with Tap ◮ Host OS: 64-bit CentOS; Linux Kernel 2.6.32 ◮ Guest OS: Ubuntu 12.04 Server

◮ Others:

◮ Btrfs (nested VMs) ◮ DMTCP optimizations ◮ Commodity computer

slide-26
SLIDE 26

Experimental Results: Scalability

2 4 6 8 10 12 Number of Nodes 2 4 6 8 10 12 Time (seconds)

Checkpoint Restart

Checkpoint-restart of HPCC benchmark on a Gigabit Ethernet cluster, (Memory allocated in each case is 1024 MB.)

slide-27
SLIDE 27

Experimental Results: Optimizations - I

◮ Btrfs filesystem

◮ Fast, incremental checkpoints ◮ Copy-on-write filesystem ◮ Going to be the default filesystem (soon?) ◮ Nested VMs

slide-28
SLIDE 28

Experimental Results: Optimizations - I

◮ Btrfs filesystem

◮ Fast, incremental checkpoints ◮ Copy-on-write filesystem ◮ Going to be the default filesystem (soon?) ◮ Nested VMs

◮ DMTCP optimizations

◮ Forked checkpointing: copy-on-write: fork a child to write

checkpoint; parent continues

◮ mmap-based fast restart: on-demand paging from the

checkpoint image

slide-29
SLIDE 29

Experimental Results: Optimizations - II

1 2 4 Number of Nodes 5 10 15 20 25 30 35 40 Time (seconds)

Ckpt w/ Btrfs Ckpt w/o Btrfs Restart w/ Btrfs Restart w/o Btrfs

Snapshotting up to four distributed VMs running HPCC under KVM/QEMU. The Btrfs filesystem is used to snapshot the filesystem using nested VMs. (Memory allocated in each case is 384 MB. The size of the guest filesystem is 2 GB.)

slide-30
SLIDE 30

Experimental Results: Optimizations - II

1 2 4 8 12 Number of Nodes 2 4 6 8 10 12 Time (seconds)

Ckpt Ckpt w/ F/C Ckpt w/ F/R Ckpt w/ F/C + F/R

Checkpoint of HPCC benchmark on a Gigabit Ethernet cluster, as influenced by DMTCP’s optional optimizations: forked checkpoint (F/C) and fast restart (F/R). DMTCP’s default gzip compression

  • f checkpoint images is incompatible with DMTCP F/R, and so is

not used in those cases. (Memory allocated in each case is 1024 MB.)

slide-31
SLIDE 31

Experimental Results: Optimizations - II

1 2 4 8 12 Number of Nodes 1 2 3 4 5 6 Time (seconds)

Restart Restart w/ F/C Restart w/ F/R Restart w/ F/C + F/R

Restart of HPCC benchmark on a Gigabit Ethernet cluster, as influenced by DMTCP’s optional optimizations: forked checkpoint (F/C) and fast restart (F/R). DMTCP’s default gzip compression

  • f checkpoint images is incompatible with DMTCP F/R, and so is

not used in those cases. (Memory allocated in each case is 1024 MB.)

slide-32
SLIDE 32

Outline

Motivation Related Work Design and Implementation Experimental Results Conclusion

slide-33
SLIDE 33

Conclusion

Summary

◮ Generic mechanism for checkpoint-restart: QEMU

(user-space), Lguest (paravirtualization), QEMU/KVM (hardware-assisted virtualization)

◮ Btrfs: fast, incremental snapshots ◮ Low maintainability, high flexibility: plugin with 400 LOC

slide-34
SLIDE 34

Questions?