Checkpointing for the RESTART Problem in Markov Networks Lester - PowerPoint PPT Presentation

Overview of ME distributions 2 Failure Recover Scenarios 7 A Taboo Process - Two Absorbing States 14 RESTART and Checkpoints for Markov Models 18 Example 31 Checkpointing for the RESTART Problem in Markov Networks Lester Lipsky Derek Doran Swapna Gokhale (With lots of help from Steve Thompson) Department of Computer Science & Engineering University of Connecticut New Frontiers in Applied Probability at Sandbjerg Estate, So /nderborg, 1-5 August 2011 Conference in Honour of So /ren Asmussen on the occasion of his 65th Birthday Lipsky, Doran, Gokhale Checkpointing for the RESTART Problem in Markov Networks

Overview of ME distributions 2 Failure Recover Scenarios 7 A Taboo Process - Two Absorbing States 14 RESTART and Checkpoints for Markov Models 18 Example 31 Overview 1 Overview of ME distributions 2 Failure Recover Scenarios 7 A Taboo Process - Two Absorbing States 14 RESTART and Checkpoints for Markov Models 18 Example 31 Lipsky, Doran, Gokhale Checkpointing for the RESTART Problem in Markov Networks

Overview of ME distributions 2 Failure Recover Scenarios 7 A Taboo Process - Two Absorbing States 14 RESTART and Checkpoints for Markov Models 18 Example 31 Matrix Exponential (ME) Distributions - I 2 Subsystem with M nodes (phases) Lipsky, Doran, Gokhale Checkpointing for the RESTART Problem in Markov Networks

Overview of ME distributions 2 Failure Recover Scenarios 7 A Taboo Process - Two Absorbing States 14 RESTART and Checkpoints for Markov Models 18 Example 31 Matrix Exponential (ME) Distributions - II 3 ◮ Let P be a transition M -Matrix such that I − P has an inverse; ε ′ be an M dimensional column-vector of all 1’s; ◮ Let ε ′ ε ′ ◮ Let p be an M row-vector where ( p ) i is the probability that ε ′ = 1; the process will start at node i , and p ε ′ ε ′ ◮ Let each of the M nodes have exponential service time distributions, with rate µ i = ( M ) ii > 0 ( M is a diagonal matrix); ◮ Let T be the time from entry to departure; Lipsky, Doran, Gokhale Checkpointing for the RESTART Problem in Markov Networks

Overview of ME distributions 2 Failure Recover Scenarios 7 A Taboo Process - Two Absorbing States 14 RESTART and Checkpoints for Markov Models 18 Example 31 Matrix Exponential (ME) Distributions - III 4 ◮ Define V = B − 1 ; B = M ( I − P ) and ◮ Then the Probability Distribution (PDF), Reliability, and probability density (pdf) functions for T are ¯ ε ′ F ( t ) := P P r [ T ≤ t ] = 1 − p exp( − t B ) ε ′ ε ′ . F ( t ) = 1 − F ( t ) , P f ( t ) = dF dt = p exp( − t B ) B ε ′ ε ′ ε ′ . and ◮ Also E [ T ℓ ] = ℓ ! pV ℓ ε ′ ε ′ ε ′ . E E Lipsky, Doran, Gokhale Checkpointing for the RESTART Problem in Markov Networks

Overview of ME distributions 2 Failure Recover Scenarios 7 A Taboo Process - Two Absorbing States 14 RESTART and Checkpoints for Markov Models 18 Example 31 ME Representation of the Uniform Distribution 5 U 2 (t) 1 U 3 (t) U 4 (t) U 5 (t) Density Function, Uniform 0.8 U 6 (t) U 7 (t) U 8 (t) 0.6 U 10 (t) U 20 (t) U 40 (t) 0.4 U 80 (t) U 120 (t) U 200 (t) 0.2 0 0 0.5 1 1.5 2 2.5 t Lipsky, Doran, Gokhale Checkpointing for the RESTART Problem in Markov Networks

Overview of ME distributions 2 Failure Recover Scenarios 7 A Taboo Process - Two Absorbing States 14 RESTART and Checkpoints for Markov Models 18 Example 31 Truncated Power-tail (TPT) Distributions 6 0 10 −2 10 −4 R ∞ (x) → c x − α 10 R T (x) = Pr(BurstLength > x) −6 10 −8 10 −10 10 −12 10 −14 10 T=1 T=10 T=20 T=30 T=40 −16 10 −18 10 0 2 4 6 8 10 10 10 10 10 X Lipsky, Doran, Gokhale Checkpointing for the RESTART Problem in Markov Networks

Overview of ME distributions 2 Failure Recover Scenarios 7 A Taboo Process - Two Absorbing States 14 RESTART and Checkpoints for Markov Models 18 Example 31 Recovery Scenarios 7 There have been three general scenarios about recovering after a system crashes during execution. ◮ preemptive Resume (prs) - RESUME ◮ preemptive repeat different (prd) - REPLACE ◮ preemptive repeat identical (pri) - RESTART RESUME and REPLACE can be analyzed by Markov models. RESTART, however, is more difficult to treat. Lipsky, Doran, Gokhale Checkpointing for the RESTART Problem in Markov Networks

Overview of ME distributions 2 Failure Recover Scenarios 7 A Taboo Process - Two Absorbing States 14 RESTART and Checkpoints for Markov Models 18 Example 31 The Performance of Systems Under RESTART - I 8 ◮ Let T be the time for a job to complete without failures, . ◮ Let F ( t ) , f ( t ) and ¯ F ( t ) = 1 − F ( t ) be the PDF, pdf , and reliability functions for T. ◮ Assume that the failure distribution is exponential with failure rate β . Then for T = t , let X ( t , β ) be the completion time with failures, under RESTART, with PDF H ( x | t ). Then its Laplace transform was shown to be H ∗ ( s | t ) = ( s + β ) e − ( s + β ) t s + β e − ( s + β ) t . ◮ Since this is the moment generating function of H ( x | t ), we have in general � d ℓ H ∗ ( s | t ) � E [ X ( t , β ) ℓ ] = ( − 1) ℓ E E . ds ℓ s =0 Lipsky, Doran, Gokhale Checkpointing for the RESTART Problem in Markov Networks

Overview of ME distributions 2 Failure Recover Scenarios 7 A Taboo Process - Two Absorbing States 14 RESTART and Checkpoints for Markov Models 18 Example 31 The Performance of Systems Under RESTART - II 9 ◮ Since T = t throughout a RESTART process, it follows that � ∞ E [ X ( β ) ℓ ] = E [ X ( t , β ) ℓ ] f ( t ) dt . E E E E 0 ◮ In particular, for ℓ = 1 we have E [ X ( t , β )] = e β t − 1 E E and β e β t − 1 � ∞ E E [ X ( β )] = f ( t ) dt E β 0 . Lipsky, Doran, Gokhale Checkpointing for the RESTART Problem in Markov Networks

Overview of ME distributions 2 Failure Recover Scenarios 7 A Taboo Process - Two Absorbing States 14 RESTART and Checkpoints for Markov Models 18 Example 31 The Performance of Systems Under RESTART - III 10 Define: � ∞ � � λ s := sup λ | exp( λ t ) f ( t ) dt < ∞ . o Also define � ∞ � � x ℓ h ( x ) dx < ∞ α := sup ℓ | o where h ( x ) is the pdf for X ( β ) (total completion time under RESTART ). Then X ( β ) is power-tailed (PT) with index α if 0 < α < ∞ . Lipsky, Doran, Gokhale Checkpointing for the RESTART Problem in Markov Networks

Overview of ME distributions 2 Failure Recover Scenarios 7 A Taboo Process - Two Absorbing States 14 RESTART and Checkpoints for Markov Models 18 Example 31 The Performance of Systems Under RESTART - IV 11 From these definitions we have the following. ◮ if T has infinite support, X ( β ) is sub-exponential . ◮ f ( t ) has an exponential tail with parameter λ s if 0 < λ s < ∞ . If λ s = 0 then f ( t ) is sub-exponential . ◮ if T has an exponential tail with parameter λ s , then X ( β ) will be PT with index α = λ s /β. Thus as β becomes bigger, α becomes smaller, and the system behavior becomes more unstable. Lipsky, Doran, Gokhale Checkpointing for the RESTART Problem in Markov Networks

Overview of ME distributions 2 Failure Recover Scenarios 7 A Taboo Process - Two Absorbing States 14 RESTART and Checkpoints for Markov Models 18 Example 31 Markov Models of Software (MMS model) 12 ◮ Software systems (among others) are highly modular, where the system control is passed among independent components. ◮ The passing of control between the M components (nodes) maps to an M dimensional Markov matrix, P . ◮ Assume that: ◮ the service time at each node is exponentially distributed with rate µ i := [ M ] ii > 0; ◮ there is a path to exit the system from each node; Then, as previously described, the distribution for the total execution time T is ME distributed (actually, PHase ). Lipsky, Doran, Gokhale Checkpointing for the RESTART Problem in Markov Networks

Overview of ME distributions 2 Failure Recover Scenarios 7 A Taboo Process - Two Absorbing States 14 RESTART and Checkpoints for Markov Models 18 Example 31 The MMS Model Under RESTART 13 For ME distributions, λ s := Min [ | λ i | ], where { λ i | 1 ≤ i ≤ M } is the set of eigenvalues of B whose eigenvectors are not orthogonal to p or ε ′ ε ′ ε ′ . ◮ If the MMS model is subject to exponential failures, and must RESTART, X ( β ) will be PT distributed with α = λ s /β ◮ The first two moments of X ( β ) are given by: V ( I − β V ) − 1 � � ε ′ ε ′ ε ′ E [ X ( β )] = p ( β < λ s ) E E E [ X ( β ) 2 ] = 2 p V 2 ( I − 2 β V ) − 2 ( I − β V ) − 1 � � ε ′ ε ′ ε ′ E E ( β < λ s / 2) even though X ( β > 0) is not ME. Lipsky, Doran, Gokhale Checkpointing for the RESTART Problem in Markov Networks

Overview of ME distributions 2 Failure Recover Scenarios 7 A Taboo Process - Two Absorbing States 14 RESTART and Checkpoints for Markov Models 18 Example 31 Markov Chains with Two Absorbing States - I 14 ◮ Consider an ( M +2)-dimensional Markov matrix ¯ P with two absorbing states, a and b . That is, P ¯ ε ′ = ¯ ¯ (¯ P ) aa = (¯ ε ′ ε ′ ε ′ ε ′ ε ′ and P ) bb = 1 ◮ Deleting the rows and columns of a and b gives P . ◮ Then, [ Z ] ij := [( I − P ) − 1 ] ij is the expected number of visits to j before absorption, given that the chain started at i . Lipsky, Doran, Gokhale Checkpointing for the RESTART Problem in Markov Networks

Checkpointing for the RESTART Problem in Markov Networks Lester - PowerPoint PPT Presentation

Overview of ME distributions 2 Failure Recover Scenarios 7 A Taboo Process - Two Absorbing States 14 RESTART and Checkpoints for Markov Models 18 Example 31 Checkpointing for the RESTART Problem in Markov Networks Lester Lipsky Derek

CSC2/458 Parallel and Distributed Systems Checkpointing and Recovery Sreepathi Pai April 17,

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Restart to Recover Restart and debottleneck your business operations to adjust to changes in

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Outline Markov networks (a.k.a. Markov random fields) Markov Networks Reading: Michael

Markov Logic Networks Matt Richardson and Pedro Domingos (2006), Markov Logic Networks, Machine

Markov Logic Networks Matt Richardson and Pedro Domingos (2006), Markov Logic Networks, Machine

Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov

Imprecise Markov chains From basic theory to applications II prof. Jasper De Bock Imprecise

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Discrete Time Markov Chains Discrete-Time Markov Chains Books - Introduction to Stochastic

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

CHECKPOINT/CLEARIDLE Overarching Goal Mobile clients need to provide a smooth responsive

FS Consistency & Journaling Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

Distributed Real-Time Stream Processing: Why and How Petr Zapletal @petr_zapletal NE Scala 2016

TRAINING NEURAL TRAINING NEURAL NETWORKS ON THE NETWORKS ON THE EDGE EDGE Navjot Kukreja,

Crash recovery Organization 13: Failure and Recovery Boris Glavic Slides: adapted from a

Fault Tolerance For Sparse Linear Algebra Computations Implemented In A Grid Environment

Cryptographic Checksums Mathematical function to generate a set of k bits from a set of n bits

Is End-to-End Integrity Verification Really End- to-End? Ahmed Alhussen, Batyr Charyyev, and Engin

Sambuz

Useful Links

Newsletter

Mail Us

Checkpointing for the RESTART Problem in Markov Networks Lester - PowerPoint PPT Presentation

Overview of ME distributions 2 Failure Recover Scenarios 7 A Taboo Process - Two Absorbing States 14 RESTART and Checkpoints for Markov Models 18 Example 31 Checkpointing for the RESTART Problem in Markov Networks Lester Lipsky Derek

CSC2/458 Parallel and Distributed Systems Checkpointing and Recovery Sreepathi Pai April 17,

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Restart to Recover Restart and debottleneck your business operations to adjust to changes in

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Outline Markov networks (a.k.a. Markov random fields) Markov Networks Reading: Michael

Markov Logic Networks Matt Richardson and Pedro Domingos (2006), Markov Logic Networks, Machine

Markov Logic Networks Matt Richardson and Pedro Domingos (2006), Markov Logic Networks, Machine

Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov

Imprecise Markov chains From basic theory to applications II prof. Jasper De Bock Imprecise

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Discrete Time Markov Chains Discrete-Time Markov Chains Books - Introduction to Stochastic

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

CHECKPOINT/CLEARIDLE Overarching Goal Mobile clients need to provide a smooth responsive

FS Consistency &amp; Journaling Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

Distributed Real-Time Stream Processing: Why and How Petr Zapletal @petr_zapletal NE Scala 2016

TRAINING NEURAL TRAINING NEURAL NETWORKS ON THE NETWORKS ON THE EDGE EDGE Navjot Kukreja,

Crash recovery Organization 13: Failure and Recovery Boris Glavic Slides: adapted from a

Fault Tolerance For Sparse Linear Algebra Computations Implemented In A Grid Environment

Cryptographic Checksums Mathematical function to generate a set of k bits from a set of n bits

Is End-to-End Integrity Verification Really End- to-End? Ahmed Alhussen, Batyr Charyyev, and Engin

Sambuz

Useful Links

Newsletter

Mail Us

FS Consistency & Journaling Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)