You Only Live Multiple Times Black box re-use of Crash-Stop - PowerPoint PPT Presentation

You Only Live Multiple Times Black box re-use of Crash-Stop Algorithms In Realistic Crash-Recovery Settings — DAVID KOZHAYA 1 , OGNJEN MARIC 2 , AND YVONNE-ANNE PIGNOLET 1 1 ABB CORPORATE RESEARCH SWITZERLAND , 2 DIGITAL ASSET SWITZERLAND A big thank you to Klaus Tycho-Foerster for presenting on our behalf!

— Mitigating The Effect of Failures Fault Tolerance in Distributed Systems Failures happen in real systems Consensus: typical way to mitigate the effect of failures To minimize the impact of failures on service interruption, implement consensus protocols that tolerate failures Distributed parties agree on the actions to • perform despite failures Consensus protocols studied under different synchrony and failure assumptions 1/8/2019 2 You only live multiple times, Kozhaya, Maric, Pignolet, OPODIS 2018

— Solving Consensus in Presence of Failures 2 1 Algorithm Solving Consensus Asynchronous system model • Unbounded processing delays • Unbounded communication delays Failure Crash-stop failure model (CS model) Detector • The simplest failure model • A process crashes by stopping to execute the algorithm forever Partially synchrony Asynchronous system Asynchronous system Partial synchrony and failure detectors rely on + + + Crash-stop failures Crash-stop failures Crash-stop failures system conditions that eventually hold forever In reality, failure and recovery modes of processes and links are probabilistic and temporary 1/8/2019 3 You only live multiple times, Kozhaya, Maric, Pignolet, OPODIS 2018

— Crash Recovery Settings A Way To Capture a System's Dynamicity New Algorithm Solving Consensus Crash-recovery failure model (CR model) • Process can join and leave unannounced • Does not address communication New Failure Subsequently: Detector • New failure detectors and new consensus algorithms on top • Processes that crash and recover infinitely often Asynchronous system are excluded – not required to satisfy consensus + properties Crash-Recovery failures What remains unanswered: Can the plethora of existing crash-stop algorithms be reused unchanged in crash-recovery settings? 1/8/2019 4 You only live multiple times, Kozhaya, Maric, Pignolet, OPODIS 2018

— Our Contribution Re-use crash-stop consensus algorithms with reliable links and failure detectors in crash-recovery model Crash-Stop Consensus Algorithm Failure Reliable Detector Channel Our solution A system where all processes and links can crash and recover unboundedly Deterministic CS (crash-stop) consensus algorithms implement consensus with probability 1 in CR systems where processes and links crash and recover unboundedly 1/8/2019 5 You only live multiple times, Kozhaya, Maric, Pignolet, OPODIS 2018

— The Rest of This Talk • What is different about our approach compared to exiting works • What do we assume in our models • How our wrapper works • What class of algorithms benefit from our results 1/8/2019 6 You only live multiple times, Kozhaya, Maric, Pignolet, OPODIS 2018

— What is different compared to exiting works?

— Difference with Existing Literature Existing solutions Our approach – Existing crash-recovery deterministic solutions: – Our approach: • Implement consensus deterministically • Implement consensus with probability 1 • Exclude processes that crash and recover • Include processes and links that crash and unboundedly recover infinitely often • Introduces new failure detector definitions and • Is modular – does not introduce new algorithms consensus algorithms but rather uses existing crash stop algorithms – Existing probabilistic solutions: • Implement consensus with probability 1 • Introduces new consensus algorithms, e.g., based on random coin flips 1/8/2019 8 You only live multiple times, Kozhaya, Maric, Pignolet, OPODIS 2018

— Description of Our Models

— Reliable asynchronous crash-stop model Reliable asynchronous CS Algorithm Reliable Failure Channel Detector Time: asynchronous processes (no clocks), asynchronous links Failures: processes may crash, all messages get delivered 1/8/2019 Slide 10 You only live multiple times, Kozhaya, Maric, Pignolet, OPODIS 2018

— Lossy synchronous crash-recovery model Lossy Synchronous CR Algorithm Lossy Channel Time: synchronous steps, synchronous links (upper bounds for steps and transmission) Failures: processes may crash and recover infinitely often, messages may get lost 1/8/2019 Slide 11 You only live multiple times, Kozhaya, Maric, Pignolet, OPODIS 2018

— Probabilistic crash-recovery model Probabilistic CR Algorithm Prob. lossy Channel Time: synchronous processes, synchronous links Failures: processes and link crash and recover with probability in (0,1) 1/8/2019 Slide 12 You only live multiple times, Kozhaya, Maric, Pignolet, OPODIS 2018

— Model overview Reliable Asynchronous CS Lossy Synchronous CR Probabilistic CR Sync processes and links Async processes and links Sync processes and links Processes crash and recover Processes may crash, Processes may crash and recover, probabilistically, messages reliable communication lossy communication dropped probabilistically 1/8/2019 Slide 13 You only live multiple times, Kozhaya, Maric, Pignolet, OPODIS 2018

— How our Wrapper works

— Crash-Recovery Wrapper The red box is our wrapper Crash-Stop Consensus Algorithm Reliable Failure Channel Detector Create synchronous crash-recovery step • Crash-stop messages and using multiple crash-stop steps (each failure detector output handling one message) Round-by-round failure detector to • produce outputs to be fed to CS algo Crash-recovery messages and acks Provide reliable links by LIFO buffering • and retransmission until ack Probabilistic Channel 1/8/2019 Slide 15 You only live multiple times, Kozhaya, Maric, Pignolet, OPODIS 2018

— What algorithms benefit from our results

— Bounded Algorithms The Class of Algorithms To Which Our Results Apply A “bounded” crash-stop consensus algorithm satisfies for fixed B, B s , B ∆ : (B1) Communication-closed rounds : Processes operate in rounds, only messages from current round are considered. (B2) Externally triggered state changes : In every round, a process changes state only upon message receipts or failure detector output changes. (B3) Bounded round messages : In any round, a process sends at most B s messages to any other process. (B4) Bounded round gap : The fastest (n-f) processes are always at most B ∆ rounds apart. (B5) Bounded termination : Given any time t where the fastest (n-f) processes are correct, all other processes are faulty, and the failure detector output is perfect after t, then all (n-f) fastest processes decide before any of them reaches round B max = max_round(t)+B. Theorem. If a bounded algorithm solves consensus in the crash-stop setting, then this algorithm using our wrapper solves consensus with probability 1 in our crash-recovery setting 1/8/2019 17 You only live multiple times, Kozhaya, Maric, Pignolet, OPODIS 2018

— Bounded Algorithms Examples Examples of existing algorithms that are bounded are: • The Chandra-Toueg algorithm [1] • Algorithms in the generic indulgent framework of [2] [1] Tushar Deepak Chandra and Sam Toueg. Unreliable failure detectors for reliable distributed systems. J. ACM, 43(2):225 – 267, 1996. [2] Rachid Guerraoui and Michel Raynal. A generic framework for indulgent consensus. In ICDCS, 2003. 1/8/2019 18 You only live multiple times, Kozhaya, Maric, Pignolet, OPODIS 2018

— Conclusion • Introduced system models that closely capture the messy reality of distributed systems • Allowed processes and links to fail and recover for an unbounded number of time • Proposed a wrapper to deploy crash-stop algorithms as a black box in our crash-recovery setting • Determined the conditions for reusing crash-stop algorithms unchanged in our crash-recovery setting david.kozhaya@ch.abb.com; ogi.yolmt@mynosefroze.com; yvonneanne@pignolet.ch 1/8/2019 19 You only live multiple times, Kozhaya, Maric, Pignolet, OPODIS 2018

You Only Live Multiple Times Black box re-use of Crash-Stop - PowerPoint PPT Presentation

You Only Live Multiple Times Black box re-use of Crash-Stop Algorithms In Realistic Crash-Recovery Settings DAVID KOZHAYA 1 , OGNJEN MARIC 2 , AND YVONNE-ANNE PIGNOLET 1 1 ABB CORPORATE RESEARCH SWITZERLAND , 2 DIGITAL ASSET SWITZERLAND A

TIMES TABLES HOW WE TEACH TIMES TABLES AND HOW YOU CAN HELP WHY ARE TIMES TABLES IMPORTANT?

For personal use only For personal use only For personal use only For personal use only For

COVID-19 VIRTUAL FORUM STRATEGY IN UNCERTAIN TIMES COVID-19: STRATEGY IN UNCERTAIN TIMES APRIL

SIPA - MachWall February 2018 How many times have potential clients said to you: How many times

The Institute for Digital Technologies THE TIMES AND GUARDIAN UNIVERSITY THE COMPLETE TIMES AND

Leading in Crisis: The Best of Times, The Worst of Times Dr. Kevin Nourse Leap Advocates

Live Objects Live Objects Live Objects Live Objects Krzys Ostrowski, Ken Birman, Danny Dolev

Questions? Questions? Questions? Questions? Questions? Questions? Questions? Questions?

Multiple Decrement Models Lecture: Weeks 8-9 Lecture: Weeks 8-9 (STT 456) Multiple Decrement

Multiple Decrement Models Lecture: Weeks 8-9 Lecture: Weeks 8-9 (STT 456) Multiple Decrement

Multiple Sequence Multiple Sequence Alignments Alignments Multiple alignment Pairwise

Single Single- -Thread NVE Thread NVE Multiple Subsystems, Multiple Threads Multiple

Multiple Access Readings: Kurose & Ross, 5.3, 5.5 Multiple Access Multiple hosts sharing

The Nordic approach to lifelong learning By Claus Holm We live in times of profound economic

PRIVATE EVENTS PrivateEvents@ACL-LIVE.com (512)404-1318 ACL LIVE: A Black Box for events

Love Case Packing Live Auction Slides Proper Packaging and Handling Procedure How to create live

Status of the CBM experiment Claudia Hhne, GSI Darmstadt Features of the the phase phase

AA Based on work with Maarten Bu ffi ng and Markus Diehl MPI@LHC - San Cristbal de las

Directors of Graduate Studies Meeting Minutes Wednesday, October 11, 2017 3:30 p.m.-5:00 p.m.,

Language Modeling for Speech Recognition in Agglutinative Languages Ebru Arsoy Murat Sara

Constraint Handling Rules - Basic CHR programs and their analysis Prof. Dr. Thom Fr uhwirth |

Modern Virtual Machine Performance murphee (Werner Schuster)

Data Mining for Knowledge Management Mining Data Streams Themis Palpanas University of Trento

9/17/2010 Todays lecture Advanced databases and data models: Native XML management Shredding

You Only Live Multiple Times Black box re-use of Crash-Stop - PowerPoint PPT Presentation

You Only Live Multiple Times Black box re-use of Crash-Stop Algorithms In Realistic Crash-Recovery Settings DAVID KOZHAYA 1 , OGNJEN MARIC 2 , AND YVONNE-ANNE PIGNOLET 1 1 ABB CORPORATE RESEARCH SWITZERLAND , 2 DIGITAL ASSET SWITZERLAND A

TIMES TABLES HOW WE TEACH TIMES TABLES AND HOW YOU CAN HELP WHY ARE TIMES TABLES IMPORTANT?

For personal use only For personal use only For personal use only For personal use only For

COVID-19 VIRTUAL FORUM STRATEGY IN UNCERTAIN TIMES COVID-19: STRATEGY IN UNCERTAIN TIMES APRIL

SIPA - MachWall February 2018 How many times have potential clients said to you: How many times

The Institute for Digital Technologies THE TIMES AND GUARDIAN UNIVERSITY THE COMPLETE TIMES AND

Leading in Crisis: The Best of Times, The Worst of Times Dr. Kevin Nourse Leap Advocates

Live Objects Live Objects Live Objects Live Objects Krzys Ostrowski, Ken Birman, Danny Dolev

Questions? Questions? Questions? Questions? Questions? Questions? Questions? Questions?

Multiple Decrement Models Lecture: Weeks 8-9 Lecture: Weeks 8-9 (STT 456) Multiple Decrement

Multiple Decrement Models Lecture: Weeks 8-9 Lecture: Weeks 8-9 (STT 456) Multiple Decrement

Multiple Sequence Multiple Sequence Alignments Alignments Multiple alignment Pairwise

Single Single- -Thread NVE Thread NVE Multiple Subsystems, Multiple Threads Multiple

Multiple Access Readings: Kurose &amp; Ross, 5.3, 5.5 Multiple Access Multiple hosts sharing

The Nordic approach to lifelong learning By Claus Holm We live in times of profound economic

PRIVATE EVENTS PrivateEvents@ACL-LIVE.com (512)404-1318 ACL LIVE: A Black Box for events

Love Case Packing Live Auction Slides Proper Packaging and Handling Procedure How to create live

Status of the CBM experiment Claudia Hhne, GSI Darmstadt Features of the the phase phase

AA Based on work with Maarten Bu ffi ng and Markus Diehl MPI@LHC - San Cristbal de las

Directors of Graduate Studies Meeting Minutes Wednesday, October 11, 2017 3:30 p.m.-5:00 p.m.,

Language Modeling for Speech Recognition in Agglutinative Languages Ebru Arsoy Murat Sara

Constraint Handling Rules - Basic CHR programs and their analysis Prof. Dr. Thom Fr uhwirth |

Modern Virtual Machine Performance murphee (Werner Schuster)

Data Mining for Knowledge Management Mining Data Streams Themis Palpanas University of Trento

9/17/2010 Todays lecture Advanced databases and data models: Native XML management Shredding

Multiple Access Readings: Kurose & Ross, 5.3, 5.5 Multiple Access Multiple hosts sharing