Computing Optimal Self- Repair Actions: Damage Minimization versus - - PowerPoint PPT Presentation

computing optimal self repair actions damage minimization
SMART_READER_LITE
LIVE PREVIEW

Computing Optimal Self- Repair Actions: Damage Minimization versus - - PowerPoint PPT Presentation

University of Paderborn Software Engineering Group Prof. Dr. Wilhelm Schfer Computing Optimal Self- Repair Actions: Damage Minimization versus Repair Time Matthias Tichy, Holger Giese, Daniela Schilling, Wladimir Pauls Daniela Schilling


slide-1
SLIDE 1

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling – May 2005

Computing Optimal Self- Repair Actions: Damage Minimization versus Repair Time

Matthias Tichy, Holger Giese, Daniela Schilling, Wladimir Pauls

slide-2
SLIDE 2

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 2

Motivation

www.railcab.de

slide-3
SLIDE 3

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 3

Redundant implementations of important software

components

Required: reconfiguration Given: automatism to detect failed components Self-Repair Actions: automatic calculation of redeployment

for failed components

pc2:Position Calculation

Avalon Taliesin Uther Gareth Gorlois Arthur

vot:Voter gps:GPS- Controller cc:Convoy mul:Multiplier pc3:Position Calculation

Motivation

pc1:Position Calculation

slide-4
SLIDE 4

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 4

Initial Deployment

  • Map deployment constraints given as extended

UML Deployment Diagrams to inequalities over boolean and integer variables

  • Use constraint solver to calculate initial

deployment

WOSS/FSE 2004: Matthias Tichy, Daniela Schilling, Holger Giese: Design of Self-Managing Dependable Systems with UML and Fault Tolerance Patterns pc1:Position Calculation pc2:Position Calculation Node1: Node2: pc1.mem=2.0Mb

slide-5
SLIDE 5

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 5

Online Redeployment

  • Node crash failure ⇒ all components running on this

node fail too

  • Compute Self-Repair Action
  • > Find suitable nodes to redeploy failed components
  • How to find suitable nodes?
  • What to do if there is no suitable node?
  • Redeploy further (still running) components
  • Damage: negative effects of unavailable components
  • Costs
  • Goal: minimize costs
  • Keep damage as low as possible
  • Reduce solving time

calculate redeployment perform redeployment

damage time

Costs

Failed components Components to be migrated

slide-6
SLIDE 6

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 6

Online Redeployment

  • 1.Solution -
  • Remove crashed nodes from constraint

system

  • Solve complete constraint system again

damage time

slide-7
SLIDE 7

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 7

Online Redeployment

  • 2.Solution -
  • Remove crashed nodes from constraint system
  • Add objective function (minimize damage

caused by migration of running componets) to the constraint system

  • Solve complete system again

damage time

slide-8
SLIDE 8

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 8

Online Redeployment

  • Our Approach -
  • Remove crashed nodes from constraint system
  • Add objective function (minimize damage) to the

constraint system

  • Try to solve constraint systems for failed

components only

  • Until a solution is found: extend set of

components that have to be redeployed/migrated

  • Use Constraint solver
  • Heuristic approach
slide-9
SLIDE 9

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 9

Online Redeployment

  • Our Approach -

damage time

slide-10
SLIDE 10

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 10

Choosing Components for Redeployment

  • Example: 3 redundant copies
  • f important components
  • Algorithm:
  • Try to redeploy failed

component

  • Until redeployment is possible:
  • 1. Choose components which are no

redundant copies of failed components

  • 2. Choose components where only
  • ne of three redundant copies

already failed

  • 3. Choose arbitrary components
slide-11
SLIDE 11

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 11

Choosing Components for Redeployment

  • Example: 3 redundant copies
  • f important components
  • Algorithm:
  • Try to redeploy failed

component

  • Until redeployment is possible:
  • 1. Choose components which are no

redundant copies of failed components

  • 2. Choose components where only
  • ne of three redundant copies

already failed

  • 3. Choose arbitrary components
slide-12
SLIDE 12

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 12

Experiment

Scenario:

36 nodes with 114 links 72 components with 99 connectors 5 node-specific (CPU, OS, Memory, Utilization,

HDD) and 2 link-specific (Bandwidth, Loss) deployment restrictions

set of deployment constraints on components and

connectors

Experiment:

Randomly selected a

node and let it fail

slide-13
SLIDE 13

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 13

Experimental Results

34 5 30 7 Damage 31 1 29 N/A Damage 34 4 97 773 Damage 50 16430 13660 4 10 14920 13790 3 30 56060 14890 2 50 > 1h 13630 1 Time (ms) Time (ms) Time (ms) Our Algorithm

  • 2. Solution
  • 1. Solution

Test Nr. damage time

slide-14
SLIDE 14

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 14

Algorithm to calculate optimal self-repair

actions

Deployment constraints solved by standard

constraint solver

Experiment showed that algorithm is nearly

  • ptimal in damage minimization and time

consumption

Not presented: pre-solving step Communication and monitoring framework Describe repair rules by graph transformation

systems

Conclusion & Future Work

slide-15
SLIDE 15

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 15

Appendix

slide-16
SLIDE 16

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 16

pc2:Position Calculation

Avalon Taliesin Uther Gareth Gorlois Arthur

vot:Voter gps:GPS- Controller cc:Convoy mul:Multiplier pc3:Position Calculation pc1:Position Calculation

Simple Redeployment

slide-17
SLIDE 17

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 17

Avalon

Mem=2.5Mb

Taliesin

Mem=1.5Mb

Uther

Mem=1Mb

Gareth

Mem=2Mb

Gorlois

Mem=2Mb

Arthur

Mem=1.5Mb gps:GPS- Controller Mem=0.5Mb vot:Voter Mem:0.5Mb cc:Convoy Mem=0.7Mb pc2:Position Calculation Mem=1.5Mb mul:Multiplier Mem=0.25Mb pc3:Position Calculation

Example

pc1:Position Calculation Mem=2Mb pc1:Position Calculation Mem=2Mb pc2:Position Calculation Mem=1.5Mb

slide-18
SLIDE 18

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 18

Damage Calculation

C1 C5 C2 C3 C4

n1 n5 n3 n2 n4

damage=13 damage=13 damage: all=13 2of3=4 1of3=1

slide-19
SLIDE 19

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 19

Submodel Expansion

Initial situation a b c d e f g Failed components Running components 1) a b c d e f g Submodel: Consider: Consider later: Submodel not solvable 2) a b c d e f g Redundant copies 3) a b c d e f g Not related Submodel not solvable 4) a b c e d f g

slide-20
SLIDE 20

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 20

Submodel Expansion(2)

Failed components Running components 4) a b c e Submodel not solvable d f g 5) a b c e d f g Redundant copies 6) a b c e d f g 7) a b c e f g d Submodel solvable

slide-21
SLIDE 21

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 21

Pre-Solving

slide-22
SLIDE 22

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 22

Foundations (TMR)

Use fault tolerance techniques to ensure

dependability

Triple Modular Redundancy (TMR)

:Multiplier :Component2 :Provider :User :Voter :Component3 :Component1

slide-23
SLIDE 23

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 23

{ Node3.CPU Node4.CPU Node4.CPU Node5.CPU Node3.CPU Node 5.CPU }

Foundations (TMR)

Deployment constraints for TMR

Avoid crash failures

  • > Deploy redundant

components to distinct nodes Avoid single-point-

  • f-failure of voter /

multiplier

  • > Deploy voter and

user to same node

(if the user fails, the failure of the voter is no problem)

Heterogeneous hardware platform

  • > require different CPU

:Multiplier :Component2 :Provider :User :Voter :Component3 :Component1 Node1: Node2: Node3: Node4: Node5:

slide-24
SLIDE 24

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 24

www. .de

Questions?

slide-25
SLIDE 25

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 25

Online Redeployment

  • Our Solution -
  • Compute Self-Repair Action
  • > Find suitable nodes to redeploy failed

components

  • How to find suitable nodes?
  • What to do if there is no suitable node?
  • 2) Redeploy further (still running) components
  • Goal: reduce costs
  • Redeployment should not decrease dependability

(reduce damage)

  • Reduce solving time
slide-26
SLIDE 26

University of Paderborn

Software Engineering Group

  • Prof. Dr. Wilhelm Schäfer

Daniela Schilling - May 2005- 26