Using Application-Driven Checkpointing for Hot Spare High - - PowerPoint PPT Presentation
Using Application-Driven Checkpointing for Hot Spare High - - PowerPoint PPT Presentation
Using Application-Driven Checkpointing for Hot Spare High Availability Antti Kantee Cubical Solutions Ltd . The Target: 2n hotspare Antti Kantee <pooka@cubical.fi>, 2004 . imagine some mission critical service fact: all hardware will
The Target: 2n hotspare
Antti Kantee <pooka@cubical.fi>, 2004 .
imagine some mission critical service
fact: all hardware will break some day
for each server, install a spare server if something bad happens to the server, the spare will take over
machine crash service crash
service state will be migrated
not migrating service state: Cold spare easy ...
adding support should not cripple service
The Presentation
Antti Kantee <pooka@cubical.fi>, 2004 .
problem solution boosting performance implementation adaption conclusions standing ovation
(or the more likely rotten tomato scene)
The Problem
Antti Kantee <pooka@cubical.fi>, 2004 .
how to preserve state? classic approach: checkpoint
usually below process-level ==> transparent to process
problem in classic approach implementations: apply rarely to networked services
checkpointing will take very long checkpoint will be huge feature-support limited no external communication allowed (fd’s ...) thread-support usually non-existant
The Solution
Antti Kantee <pooka@cubical.fi>, 2004 .
Application-Driven Checkpointing instead of checkpointing being transparent to the process, do the opposite: leave checkpointing entirely up to application good: checkpoint exactly the right data checkpoint at exactly the right time possible to get extended feature support bad: need to modify each application separately
What is process state?
Antti Kantee <pooka@cubical.fi>, 2004 .
in other words: what do we want to capture memory: for a C program, this is pretty much WYGIWYG loads and stores are directly mapped to memory might be more difficult for actual programming languages "other stuff": file descriptors / sockets threads you name it ...
Application-Checkpointing: Naive Approach
Antti Kantee <pooka@cubical.fi>, 2004 .
simply write out pieces in previous two sets the application decides what gets stored
instead of application deciding what does not get stored
need to figure out some serialization form for information
for memory this is pretty easy: (addr, len, content) for "other stuff" equally easy, just more laborious
we could just write out everything in the process context when checkpoint() is called
but that doesn’t perform especially well
Boosting Performance
Antti Kantee <pooka@cubical.fi>, 2004 .
two common & cheap solutions asynchonous do not checkpoint in process context while actual cost is still there, the application does not hopefully take such a heavy penalty incremental write out deltas only the more you checkpoint, the more you save
hmm, where have I heard that before?
Asynchronous checkpointing
Antti Kantee <pooka@cubical.fi>, 2004 .
many employ fork()
get new execution context memory "protected" by copy-on-write
- k, that was easy
Incremental checkpointing
Antti Kantee <pooka@cubical.fi>, 2004 .
many employ mprotect() and signal handlers
userspace solution
MMU already tracks modification information
used by pagedaemon wire pages, and pagedaemon no longer needs that info asking MMU perhaps not the best option, but it was easy to implement ;-) some archs have soft "dirty" bit, not in MMU
Pulling memory checkpointing together
Antti Kantee <pooka@cubical.fi>, 2004 .
two new syscalls: cptctl() and cptfork() cptctl: add/remove checkpoint areas monitored for deltas query changes cptfork: mostly same as fork() check for modified pages
Additional State
Antti Kantee <pooka@cubical.fi>, 2004 .
we cannot take file descriptors, sockets, signals, threads etc. from a memory dump
kernel state, including lots of structure linkage, so transfer as opaque data not possible
use a syscall augmentation-style approach:
for most entities, it is possible to query the current state from the kernel when restoring, use normal syscalls to "trick" kernel
so basically handle this entirely in userspace unfortunately TCP is not supported :(
Dealing with Multithreading
Antti Kantee <pooka@cubical.fi>, 2004 .
do not record program counter, register values, etc. treat a thread as like any other "additional state"
record "worker function" address and argument only
for each registered thread, at restore a thread is created and the worker function is called problem: locking
Additional Support
Antti Kantee <pooka@cubical.fi>, 2004 .
define spare machine(s) move snapshots of runtime state to spare machines
TCP/IP, IP/carrier pigeon, whatever suits you
detect failures
leave that up to the application to define ;-) provide a simple "ping"-approach in the framework
direct network traffic to "spare" after master has crashed and process has been rebuilt
Application Interface to Framework
Antti Kantee <pooka@cubical.fi>, 2004 .
Philosophy: everything that can be supported application-transparently should be, but it should not prevent any tricks the application might want to pull generally what needs to be done:
reserve checkpoint memory with hsmalloc() group essential memory into e.g. structs register some additional info: hsfdreg(), hsthreadreg() sprinkle checkpoints into appropriate places: hscpt()
restore handled in framework also
Adapting
Antti Kantee <pooka@cubical.fi>, 2004 .
kernel portion should be in theory adaptable to
- ther systems
Linux & FreeBSD & Chorus investigated
userspace library should be portable code as-is adapting application is an interesting question
most UNIX programs are stateless state tied to TCP persistence state dealt with by application-specific methods
tetris was easy to adapt sqlite almost equally easy
Conclusions
Antti Kantee <pooka@cubical.fi>, 2004 .