computing optimal self repair actions damage minimization
play

Computing Optimal Self- Repair Actions: Damage Minimization versus - PowerPoint PPT Presentation

University of Paderborn Software Engineering Group Prof. Dr. Wilhelm Schfer Computing Optimal Self- Repair Actions: Damage Minimization versus Repair Time Matthias Tichy, Holger Giese, Daniela Schilling, Wladimir Pauls Daniela Schilling


  1. University of Paderborn Software Engineering Group Prof. Dr. Wilhelm Schäfer Computing Optimal Self- Repair Actions: Damage Minimization versus Repair Time Matthias Tichy, Holger Giese, Daniela Schilling, Wladimir Pauls Daniela Schilling – May 2005

  2. University of Paderborn Software Engineering Group Motivation Prof. Dr. Wilhelm Schäfer www.railcab.de Daniela Schilling - May 2005- 2

  3. University of Paderborn Software Engineering Group Motivation Prof. Dr. Wilhelm Schäfer � Redundant implementations of important software components vot:Voter pc1:Position Calculation Taliesin Avalon cc:Convoy Uther Gareth pc3:Position pc2:Position Calculation Calculation mul:Multiplier Gorlois Arthur gps:GPS- Controller � Required: reconfiguration � Given: automatism to detect failed components � Self-Repair Actions: automatic calculation of redeployment for failed components Daniela Schilling - May 2005- 3

  4. University of Paderborn Initial Deployment Software Engineering Group Prof. Dr. Wilhelm Schäfer pc1:Position Node1: pc1.mem=2.0Mb Calculation pc2:Position Node2: Calculation � Map deployment constraints given as extended UML Deployment Diagrams to inequalities over boolean and integer variables � Use constraint solver to calculate initial deployment WOSS/FSE 2004: Matthias Tichy, Daniela Schilling, Holger Giese: Design of Self-Managing Dependable Systems with UML and Fault Tolerance Patterns Daniela Schilling - May 2005- 4

  5. University of Paderborn Online Redeployment Software Engineering Group Prof. Dr. Wilhelm Schäfer � Node crash failure ⇒ all components running on this node fail too � Compute Self-Repair Action � -> Find suitable nodes to redeploy failed components � How to find suitable nodes? � What to do if there is no suitable node? � Redeploy further (still running) components � Damage: negative effects of unavailable components � Costs damage � Components to Goal: minimize costs be migrated � Keep damage as low as possible � Reduce solving time Failed Costs components time calculate redeployment perform redeployment Daniela Schilling - May 2005- 5

  6. University of Paderborn Online Redeployment Software Engineering Group Prof. Dr. Wilhelm Schäfer - 1.Solution - � Remove crashed nodes from constraint system � Solve complete constraint system again damage time Daniela Schilling - May 2005- 6

  7. University of Paderborn Online Redeployment Software Engineering Group Prof. Dr. Wilhelm Schäfer - 2.Solution - � Remove crashed nodes from constraint system � Add objective function (minimize damage caused by migration of running componets) to the constraint system � Solve complete system again damage time Daniela Schilling - May 2005- 7

  8. University of Paderborn Online Redeployment Software Engineering Group Prof. Dr. Wilhelm Schäfer - Our Approach - � Remove crashed nodes from constraint system � Add objective function (minimize damage) to the constraint system � Try to solve constraint systems for failed components only � Until a solution is found: extend set of components that have to be redeployed/migrated � Use Constraint solver � Heuristic approach Daniela Schilling - May 2005- 8

  9. University of Paderborn Online Redeployment Software Engineering Group Prof. Dr. Wilhelm Schäfer - Our Approach - damage time Daniela Schilling - May 2005- 9

  10. University of Paderborn Choosing Components for Software Engineering Group Prof. Dr. Wilhelm Schäfer Redeployment � Example: 3 redundant copies of important components � Algorithm: � Try to redeploy failed component � Until redeployment is possible: 1. Choose components which are no redundant copies of failed components 2. Choose components where only one of three redundant copies already failed 3. Choose arbitrary components Daniela Schilling - May 2005- 10

  11. University of Paderborn Choosing Components for Software Engineering Group Prof. Dr. Wilhelm Schäfer Redeployment � Example: 3 redundant copies of important components � Algorithm: � Try to redeploy failed component � Until redeployment is possible: 1. Choose components which are no redundant copies of failed components 2. Choose components where only one of three redundant copies already failed 3. Choose arbitrary components Daniela Schilling - May 2005- 11

  12. University of Paderborn Experiment Software Engineering Group Prof. Dr. Wilhelm Schäfer � Scenario: � 36 nodes with 114 links � 72 components with 99 connectors � 5 node-specific (CPU, OS, Memory, Utilization, HDD) and 2 link-specific (Bandwidth, Loss) deployment restrictions � set of deployment constraints on components and connectors � Experiment: � Randomly selected a node and let it fail Daniela Schilling - May 2005- 12

  13. University of Paderborn Experimental Results Software Engineering Group Prof. Dr. Wilhelm Schäfer Test 1. Solution 2. Solution Our Algorithm Nr. Time (ms) Damage Time (ms) Damage Time (ms) Damage 1 13630 773 > 1h N/A 50 7 2 14890 97 56060 29 30 30 3 13790 4 14920 1 10 5 4 13660 34 16430 31 50 34 damage time Daniela Schilling - May 2005- 13

  14. University of Paderborn Conclusion & Future Work Software Engineering Group Prof. Dr. Wilhelm Schäfer � Algorithm to calculate optimal self-repair actions � Deployment constraints solved by standard constraint solver � Experiment showed that algorithm is nearly optimal in damage minimization and time consumption � Not presented: pre-solving step � Communication and monitoring framework � Describe repair rules by graph transformation systems Daniela Schilling - May 2005- 14

  15. University of Paderborn Software Engineering Group Prof. Dr. Wilhelm Schäfer Appendix Daniela Schilling - May 2005- 15

  16. University of Paderborn Simple Software Engineering Group Prof. Dr. Wilhelm Schäfer Redeployment vot:Voter pc1:Position Calculation Taliesin Avalon cc:Convoy Uther Gareth pc3:Position pc2:Position Calculation Calculation mul:Multiplier Gorlois Arthur gps:GPS- Controller Daniela Schilling - May 2005- 16

  17. University of Paderborn Software Engineering Group Example Prof. Dr. Wilhelm Schäfer vot:Voter pc1:Position Mem:0.5Mb Calculation Taliesin Avalon Mem=2Mb cc:Convoy Mem=1.5Mb Mem=2.5Mb Mem=0.7Mb pc2:Position pc1:Position Uther Gareth pc3:Position Calculation Calculation Calculation Mem=1Mb Mem=2Mb Mem=1.5Mb Mem=2Mb pc2:Position mul:Multiplier Gorlois Arthur Calculation Mem=0.25Mb Mem=2Mb Mem=1.5Mb Mem=1.5Mb gps:GPS- Controller Mem=0.5Mb Daniela Schilling - May 2005- 17

  18. University of Paderborn Damage Calculation Software Engineering Group Prof. Dr. Wilhelm Schäfer n2 C2 n3 n5 n1 C1 C3 C5 damage=13 damage=13 n4 C4 damage: all=13 2of3=4 1of3=1 Daniela Schilling - May 2005- 18

  19. University of Paderborn Submodel Expansion Software Engineering Group Prof. Dr. Wilhelm Schäfer Failed components Running components Initial situation a b c d e f g Submodel: Consider later: Consider: 1) a b c d e f g Submodel not solvable 2) a b c d e f g Redundant copies 3) a b c e f g d Not related e f g d a b c 4) Submodel not solvable Daniela Schilling - May 2005- 19

  20. University of Paderborn Submodel Expansion(2) Software Engineering Group Prof. Dr. Wilhelm Schäfer Failed components Running components a b c e f g d 4) Submodel not solvable e a b c f g d 5) Redundant copies e d f g a b c 6) a b c e d f g 7) Submodel solvable Daniela Schilling - May 2005- 20

  21. University of Paderborn Pre-Solving Software Engineering Group Prof. Dr. Wilhelm Schäfer Daniela Schilling - May 2005- 21

  22. University of Paderborn Foundations (TMR) Software Engineering Group Prof. Dr. Wilhelm Schäfer � Use fault tolerance techniques to ensure dependability � Triple Modular Redundancy (TMR) :Component1 :Provider :Multiplier :Component2 :Voter :User :Component3 Daniela Schilling - May 2005- 22

  23. University of Paderborn Foundations (TMR) Software Engineering Group Prof. Dr. Wilhelm Schäfer � Deployment constraints for TMR Avoid single-point- of-failure of voter / Node1: Node2: multiplier -> Deploy voter and user to same node (if the user fails, the :Provider :Multiplier :Voter :User failure of the voter is no problem) Avoid crash failures -> Deploy redundant :Component1 :Component2 :Component3 components to distinct nodes Heterogeneous Node3: Node4: Node5: hardware platform -> require different CPU { Node3.CPU � Node4.CPU � Node4.CPU � Node5.CPU � Node3.CPU � Node 5.CPU } Daniela Schilling - May 2005- 23

  24. University of Paderborn Software Engineering Group Prof. Dr. Wilhelm Schäfer Questions? .de www. Daniela Schilling - May 2005- 24

  25. University of Paderborn Online Redeployment Software Engineering Group Prof. Dr. Wilhelm Schäfer - Our Solution - � Compute Self-Repair Action � -> Find suitable nodes to redeploy failed components � How to find suitable nodes? � What to do if there is no suitable node? � 2) Redeploy further (still running) components � Goal: reduce costs � Redeployment should not decrease dependability (reduce damage) � Reduce solving time Daniela Schilling - May 2005- 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend