software fault tolerance of concurrent programs using
play

Software Fault Tolerance of Concurrent Programs Using Controlled - PowerPoint PPT Presentation

Software Fault Tolerance of Concurrent Programs Using Controlled Re-execution Ashis Tarafdar Vijay K. Garg ashis@cs.utexas.edu garg@ece.utexas.edu Parallel and Distributed Systems Laboratory Department of


  1. Software Fault Tolerance of Concurrent Programs Using Controlled Re-execution Ashis Tarafdar Vijay K. Garg ashis@cs.utexas.edu garg@ece.utexas.edu Parallel and Distributed Systems Laboratory Department of Electrical and Computer Engineering University of Texas at Austin Austin, USA 78712 http://maple.ece.utexas.edu

  2. Introduction Software Fault Tolerance: to ensure that the system continues normal operation despite the presence of software faults (bugs) software faults cause software failures

  3. Goals A new approach to software fault tolerance The predicate control problem: introduction and results

  4. Background: Software Fault Tolerance The Progressive Retry Approach: [Wang et al, 1997] software failures are often transient rollback and re-execute no guarantees

  5. Background: Races in Concurrent Programs What is a race? A race occurs when two processes can concurrently access the same shared resource. critical section synchronization a a P1 P1 cs1 cs1 P2 P2 b cs2 cs2 b A race in a concurrent computation A race-free computation Races are an important class of software faults. [Iyer & Lee, 95]

  6. The Controlled Re-execution Approach 1. Tracing an execution 2. Detecting a race failure 3. Determining a control strategy 4. Re-executing under control added synchronization cs1 cs1 a cs4 d cs4 P1 P1 cs2 cs2 P2 P2 b P3 P3 c cs3 cs3 Controlling Computation Traced Computation

  7. Model G H cs1 cs4 P1 a cs2 e P2 d P3 f b c cs3 consistent inconsistent states computation (happened before) global state consistent global state global predicate (e.g. mutual exclusion)

  8. The Off-line Predicate Control Problem G G a cs1 d cs4 cs1 cs4 P1 P1 cs2 cs2 P2 P2 b P3 P3 c cs3 cs3 Computation C Controlling Computation C ' of B in C B = mutual exclusion Note : A controlling computation must have no cycles ! Problem Statement: Given a computation C and a global predicate B, find a controlling computation of B in C

  9. Off-line Mutual Exclusion Theorem: The off-line predicate control problem is NP-Hard [Tarafdar & Garg, 98] Off-line Independent Read-Write Mutual Exclusion Off-line Independent Off-line Readers Writers Mutual Exclusion Off-line Mutual Exclusion Variants of Off-line Mutual Exclusion

  10. A Relation on Critical Sections cs1 cs2 iff cs1 starts before cs2 finishes a cs1 P1 a cs1 P1 P2 b cs2 d P2 cs2 a f P3 cs1 P1 b c d P2 e cs2 P3 b c

  11. Off-line Readers Writers: Result Theorem : For a computation C and a global predicate B rw , a controlling computation of B rw in C exists iff all cycles in contain only read critical sections Proof : Key Ideas: Necessary: Sufficient: R R cs1 P1 cs2 R W P2 R P3 R cs3 write critical section strongly connected components

  12. Off-line Readers Writers: Algorithm A B cs1 cs5 P1 cs2 cs6 P2 cs7 P3 cs3 cs8 P4 cs4 n : number of processes p : number of critical sections in computation Algorithm 1: O(p 2 ) Key Idea : An SCC contains at most one CS per process Algorithm 2: O(n 2 p) Key Idea : Only "new" CS's need be considered Algorithm 3: O(np)

  13. Summary A new approach to software fault tolerance introduced the controlled re-execution approach for race faults focussed on the problem of determining a control strategy The off-line predicate control problem: introduction and results defined the off-line predicate control problem necessary and sufficient conditions for the off-line readers writers problem O(np) algorithm for the off-line readers writers problem also: other variants of off-line mutual exclusion

  14. On-line Mutual Exclusion is Impossible P1 cs1 P2 cs2 b G P1 cs1 P2 cs2 a H b c P1 cs1 P2 a d cs2 H

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend