Using Application-Driven Checkpointing for Hot Spare High - PowerPoint PPT Presentation

Using Application-Driven Checkpointing for Hot Spare High Availability Antti Kantee Cubical Solutions Ltd .

The Target: 2n hotspare Antti Kantee <pooka@cubical.fi>, 2004 . imagine some mission critical service fact: all hardware will break some day for each server, install a spare server if something bad happens to the server, the spare will take over machine crash service crash service state will be migrated not migrating service state: Cold spare easy ... adding support should not cripple service

The Presentation Antti Kantee <pooka@cubical.fi>, 2004 . problem solution boosting performance implementation adaption conclusions standing ovation (or the more likely rotten tomato scene)

The Problem Antti Kantee <pooka@cubical.fi>, 2004 . how to preserve state? classic approach: checkpoint usually below process-level ==> transparent to process problem in classic approach implementations: apply rarely to networked services checkpointing will take very long checkpoint will be huge feature-support limited no external communication allowed (fd’s ...) thread-support usually non-existant

The Solution Antti Kantee <pooka@cubical.fi>, 2004 . Application-Driven Checkpointing instead of checkpointing being transparent to the process, do the opposite: leave checkpointing entirely up to application good: checkpoint exactly the right data checkpoint at exactly the right time possible to get extended feature support bad: need to modify each application separately

What is process state? Antti Kantee <pooka@cubical.fi>, 2004 . in other words: what do we want to capture memory: for a C program, this is pretty much WYGIWYG loads and stores are directly mapped to memory might be more difficult for actual programming languages "other stuff": file descriptors / sockets threads you name it ...

Application-Checkpointing: Naive Approach Antti Kantee <pooka@cubical.fi>, 2004 . simply write out pieces in previous two sets the application decides what gets stored instead of application deciding what does not get stored need to figure out some serialization form for information for memory this is pretty easy: (addr, len, content) for "other stuff" equally easy, just more laborious we could just write out everything in the process context when checkpoint() is called but that doesn’t perform especially well

Boosting Performance Antti Kantee <pooka@cubical.fi>, 2004 . two common & cheap solutions asynchonous do not checkpoint in process context while actual cost is still there, the application does not hopefully take such a heavy penalty incremental write out deltas only the more you checkpoint, the more you save hmm, where have I heard that before?

Asynchronous checkpointing Antti Kantee <pooka@cubical.fi>, 2004 . many employ fork() get new execution context memory "protected" by copy-on-write ok, that was easy

Incremental checkpointing Antti Kantee <pooka@cubical.fi>, 2004 . many employ mprotect() and signal handlers userspace solution MMU already tracks modification information used by pagedaemon wire pages, and pagedaemon no longer needs that info asking MMU perhaps not the best option, but it was easy to implement ;-) some archs have soft "dirty" bit, not in MMU

Pulling memory checkpointing together Antti Kantee <pooka@cubical.fi>, 2004 . two new syscalls: cptctl() and cptfork() cptctl: add/remove checkpoint areas monitored for deltas query changes cptfork: mostly same as fork() check for modified pages

Additional State Antti Kantee <pooka@cubical.fi>, 2004 . we cannot take file descriptors, sockets, signals, threads etc. from a memory dump kernel state, including lots of structure linkage, so transfer as opaque data not possible use a syscall augmentation-style approach: for most entities, it is possible to query the current state from the kernel when restoring, use normal syscalls to "trick" kernel so basically handle this entirely in userspace unfortunately TCP is not supported :(

Dealing with Multithreading Antti Kantee <pooka@cubical.fi>, 2004 . do not record program counter, register values, etc. treat a thread as like any other "additional state" record "worker function" address and argument only for each registered thread, at restore a thread is created and the worker function is called problem: locking

Additional Support Antti Kantee <pooka@cubical.fi>, 2004 . define spare machine(s) move snapshots of runtime state to spare machines TCP/IP, IP/carrier pigeon, whatever suits you detect failures leave that up to the application to define ;-) provide a simple "ping"-approach in the framework direct network traffic to "spare" after master has crashed and process has been rebuilt

Application Interface to Framework Antti Kantee <pooka@cubical.fi>, 2004 . Philosophy: everything that can be supported application-transparently should be, but it should not prevent any tricks the application might want to pull generally what needs to be done: reserve checkpoint memory with hsmalloc() group essential memory into e.g. structs register some additional info: hsfdreg(), hsthreadreg() sprinkle checkpoints into appropriate places: hscpt() restore handled in framework also

Adapting Antti Kantee <pooka@cubical.fi>, 2004 . kernel portion should be in theory adaptable to other systems Linux & FreeBSD & Chorus investigated userspace library should be portable code as-is adapting application is an interesting question most UNIX programs are stateless state tied to TCP persistence state dealt with by application-specific methods tetris was easy to adapt sqlite almost equally easy

Conclusions Antti Kantee <pooka@cubical.fi>, 2004 . transparent checkpointing has problems application-driven checkpointing ties application semantics to the task of checkpointing knowledge can be used in optimizing checkpoint time & place kernel support provides additional boost state annoyingly tied to TCP but at least but application-driven checkpointing we have a chance to deal with it adaption effort depends greatly on application

Using Application-Driven Checkpointing for Hot Spare High - PowerPoint PPT Presentation

Using Application-Driven Checkpointing for Hot Spare High Availability Antti Kantee Cubical Solutions Ltd . The Target: 2n hotspare Antti Kantee <pooka@cubical.fi>, 2004 . imagine some mission critical service fact: all hardware will

CSC2/458 Parallel and Distributed Systems Checkpointing and Recovery Sreepathi Pai April 17,

On(x) ~Flat(x) START FINISH ~Flat(Spare) Intact(Spare) Off(Spare) On(Tire1) Flat(Tire1)

On(x) ~Flat(x) START FINISH ~Flat(Spare) Intact(Spare) Off(Spare) On(Tire1) Flat(Tire1)

Reap What You Sow: Reap What You Sow: Spare Cells for Post Spare Cells for Post-Silicon Silicon

How to create spare part manuals from Autodesk inventor design with ToDoT www.SIngeCa.it

HOT CEREALS March, 2016 THE BIG NEWS ABOUT BREAKFAST Hot Cereal Has Never Been Hotter Hot

Priority-Driven Scheduling of Periodic Tasks Priority-driven vs. clock-driven scheduling:

Adjoint Data-Flow analyses applied to checkpointing - Tradeoff between snapshots and TBR Benjamin

Reducing Costs of Spot Instances via Checkpointing in the Amazon Elastic Compute Cloud - Qingxi

Virtual Machine Checkpointing Brendan Cully University of British Columbia with Andrew Warfield

Cyber-Physical System Checkpointing and Recovery Fanxin Kong , Meng Xu, James Weimer, Oleg

INVESTOR PRESENTATION December 4, 2019 TSX: HOT.UN (CAD$) | TSX: HOT.U (US$) | TSX: HOT.DB.U

Investor Presentation TSX: HOT.UN (CAD$) TSX: HOT.U (US$) TSX: HOT.DB.U (Debentures)

Investor Presentation TSX: HOT.UN (CAD$) TSX: HOT.U (US$) TSX: HOT.DB.U (Debentures) May

Hot Topics in Visualization 12-1 Ronald Peikert SciVis 2007 - Hot Topics Hot Topic 1:

Company Presentation of INTRA - Manufacturing Engine spare parts / Cylinder Liner, Cylinder Head,

good? About Us Tom Cross, IBM X-Force Vulnerability tracking, analysis, and response

Lessons Learned From Sendmail (The Good, The Bad, The Ugly) Eric Allman, UC Berkeley (again)

Everything is Quantum The EU Quantum Flagship Our mission is to keep KPN reliable & secure

BitTorrent and fountain codes: friends or foes? Salvatore Spoto, Rossano Gaeta, Marco Grangetto,

What is your definition of media literacy? 1. Radical media education 2. Ideology in media 3.

Whats All This Internet Governance Talk and Why do I Care? Welcome to ISO Layer

Technical Debt Elizabeth Naramore Dutch PHP Conference June 11, 2010 1 Technical Debt: What

@usertesting @michaelmace @usertesting @michaelmace UserTesting.com The fastest way to get

Using Application-Driven Checkpointing for Hot Spare High - PowerPoint PPT Presentation

Using Application-Driven Checkpointing for Hot Spare High Availability Antti Kantee Cubical Solutions Ltd . The Target: 2n hotspare Antti Kantee <pooka@cubical.fi>, 2004 . imagine some mission critical service fact: all hardware will

CSC2/458 Parallel and Distributed Systems Checkpointing and Recovery Sreepathi Pai April 17,

On(x) ~Flat(x) START FINISH ~Flat(Spare) Intact(Spare) Off(Spare) On(Tire1) Flat(Tire1)

On(x) ~Flat(x) START FINISH ~Flat(Spare) Intact(Spare) Off(Spare) On(Tire1) Flat(Tire1)

Reap What You Sow: Reap What You Sow: Spare Cells for Post Spare Cells for Post-Silicon Silicon

How to create spare part manuals from Autodesk inventor design with ToDoT www.SIngeCa.it

HOT CEREALS March, 2016 THE BIG NEWS ABOUT BREAKFAST Hot Cereal Has Never Been Hotter Hot

Priority-Driven Scheduling of Periodic Tasks Priority-driven vs. clock-driven scheduling:

Adjoint Data-Flow analyses applied to checkpointing - Tradeoff between snapshots and TBR Benjamin

Reducing Costs of Spot Instances via Checkpointing in the Amazon Elastic Compute Cloud - Qingxi

Virtual Machine Checkpointing Brendan Cully University of British Columbia with Andrew Warfield

Cyber-Physical System Checkpointing and Recovery Fanxin Kong , Meng Xu, James Weimer, Oleg

INVESTOR PRESENTATION December 4, 2019 TSX: HOT.UN (CAD$) | TSX: HOT.U (US$) | TSX: HOT.DB.U

Investor Presentation TSX: HOT.UN (CAD$) TSX: HOT.U (US$) TSX: HOT.DB.U (Debentures)

Investor Presentation TSX: HOT.UN (CAD$) TSX: HOT.U (US$) TSX: HOT.DB.U (Debentures) May

Hot Topics in Visualization 12-1 Ronald Peikert SciVis 2007 - Hot Topics Hot Topic 1:

Company Presentation of INTRA - Manufacturing Engine spare parts / Cylinder Liner, Cylinder Head,

good? About Us Tom Cross, IBM X-Force Vulnerability tracking, analysis, and response

Lessons Learned From Sendmail (The Good, The Bad, The Ugly) Eric Allman, UC Berkeley (again)

Everything is Quantum The EU Quantum Flagship Our mission is to keep KPN reliable &amp; secure

BitTorrent and fountain codes: friends or foes? Salvatore Spoto, Rossano Gaeta, Marco Grangetto,

What is your definition of media literacy? 1. Radical media education 2. Ideology in media 3.

Whats All This Internet Governance Talk and Why do I Care? Welcome to ISO Layer

Technical Debt Elizabeth Naramore Dutch PHP Conference June 11, 2010 1 Technical Debt: What

@usertesting @michaelmace @usertesting @michaelmace UserTesting.com The fastest way to get

Everything is Quantum The EU Quantum Flagship Our mission is to keep KPN reliable & secure