Design Optimization of Time- and Cost-Constrained Fault-Tolerant - - PowerPoint PPT Presentation

design optimization of time and cost constrained fault
SMART_READER_LITE
LIVE PREVIEW

Design Optimization of Time- and Cost-Constrained Fault-Tolerant - - PowerPoint PPT Presentation

Design Optimization of Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems Viaceslav Izosimov, Paul Pop, Petru Eles, Zebo Peng Embedded Systems Lab (ESLAB) Linkping University, Sweden 1/21 1 of 14 Motivation Faults


slide-1
SLIDE 1

1 of 14 1/21

Design Optimization of Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems

Viaceslav Izosimov, Paul Pop, Petru Eles, Zebo Peng Embedded Systems Lab (ESLAB) Linköping University, Sweden

slide-2
SLIDE 2

2 of 14 2/21

Motivation

Hard real-time applications

Timing constraints Cost constraints

Online preemptive

Flexible

Off-line non-preemptive

Predictable

vs. Faults

Predictable Transient Intermittent

Hardware solutions

MARS, TTA, X-by-Wire

Permanent faults Costly for transient faults

Software solutions

Re-execution/rollback recovery Checkpointing/rollback recovery Replication, primary-backup…

  • vs. Software solutions

Re-execution/rollback recovery Checkpointing/rollback recovery Replication, primary-backup…

slide-3
SLIDE 3

3 of 14 3/21

Outline

Motivation System architecture and fault-model

Fault-tolerance techniques

Problem formulation

Motivational examples

Tabu-search optimization strategy Experimental results Contributions and Message

slide-4
SLIDE 4

4 of 14 4/21

Processes: Static cyclic scheduling

Fault-Tolerant Time-Triggered Systems

... Transient faults Processes: Re-execution and replication

S1 S3 S2 S4 S1 S3 S2 S4 TDMA Round Cycle of two rounds Slot

Time Triggered Protocol (TTP)

  • Bus access scheme:

time-division multiple-access (TDMA)

  • Schedule table located in each TTP

controller: message descriptor list (MEDL)

Messages: Static schedule table Messages: Fault-tolerant protocol

slide-5
SLIDE 5

5 of 14 5/21

Fault-Tolerant Techniques

P1 P1 P1

Re-execution

N1 P1 P1 P1

Replication

N1 N2 N3 P1 P1 N1 N2 P1

Re-executed replicas

2

slide-6
SLIDE 6

6 of 14 6/21

Problem Formulation

Given

Fault model

Number of transient faults in the system period

System architecture Application

WCETs, message sizes, periods, deadlines

Application: set of process graphs Architecture: time-triggered system

...

Fault-model: transient faults

Determine

Schedulable and fault-tolerant design implementation

Fault-tolerance policy assignment Mapping of processes and messages Schedule tables for processes and messages

slide-7
SLIDE 7

7 of 14 7/21

Static Scheduling [Kandasamy et al. 03]

P1 N1: S1 N2: S11 N3: S14 P2 P4 P5 P3 m1 m2

2 P2 P4 P3 P5 P1 m1 m2

N1 N2 N3

S11 S12 S13 P1 P1 N2 S1 S2 S3 P2 P2 S4 P3 S5 S6 S7 S8 S10 S9 P4 P3 P4 P3 P4 P4 N1 S14 S15 S18 P5 P5 N3

Root schedules

P1 N1: S2 N2: S12 N3: S14 P2 P4 P5 P3 m1 m2 P1

Contingency schedules

S1 S11 S2 S12

P2

Contingency schedules Transparent re-execution Recovery slack

slide-8
SLIDE 8

8 of 14 8/21

Re-execution vs. Replication

N1 N2

P1 P3 P2 m1 1 P1 P2 P3 N1 N2 40 50 40 60 50 70

N1 N2 TTP P1 P2

S1S2

P3 Met

A1

N1 N2 TTP P1 P2 P3

S1S2

Missed P1 N1 N2 TTP P1 P2 P2 P3 P3

S1S2

m1 m1 m2 m2

Deadline Met P1 P3 P2 m1 m2 A2

Replication is better

P1

S1

N1 N2 TTP P1

S2

P2 P2 P3 P3 Deadline Missed

m1 m1

Re-execution is better

slide-9
SLIDE 9

9 of 14 9/21

P1 N1 N2 TTP P2 P3

S1S2

P4

m2

Missed

Fault-Tolerant Policy Assignment

P1 P2 P3 N1 N2 40 50 60 60 80 80 P4 40 50 1

N1 N2

P1 P4 P2 P3 m1 m2 m3

P1 N1 N2 TTP P2 P3

S1S2

m2

P4 P1 N1 N2 TTP P2 P3

S1S2

m2

P4

No fault-tolerance: application crashes

Deadline

slide-10
SLIDE 10

10 of 14 10/21

P1 N1 N2 TTP P2 P3

S1S2

P4

m2

Missed

Fault-Tolerant Policy Assignment

P1 P2 P3 N1 N2 40 50 60 60 80 80 P4 40 50 1

N1 N2

P1 P4 P2 P3 m1 m2 m3

P1 N1 N2 TTP P2 P3

S1S2

m2

P4 N1 N2 P1 P3

S1S2

P4 P2 P1

m1 m1

TTP

m2 m2

P2

m3 m3

P3 P4 Missed P1 N1 N2 TTP P2 P3

S1S2

m2

P4

No fault-tolerance: application crashes

Deadline

slide-11
SLIDE 11

11 of 14 11/21

P1 N1 N2 TTP P2 P3

S1S2

P4

m2

Missed

Fault-Tolerant Policy Assignment

P1 P2 P3 N1 N2 40 50 60 60 80 80 P4 40 50 1

N1 N2

P1 P4 P2 P3 m1 m2 m3

P1 N1 N2 TTP P2 P3

S1S2

m2

P4 N1 N2 P1 P3

S1S2

P4 P2 P1

m1 m1

TTP

m2 m2

P2

m3 m3

P3 P4 Missed P1 N1 N2 TTP P2 P3

S1S2

m2

P4

No fault-tolerance: application crashes

N1 N2 P1 P3

S1S2

P4 P2 P1

m2 m1

TTP Met

Optimization

  • f fault-tolerance

policy assignment

Deadline

slide-12
SLIDE 12

12 of 14 12/21

Mapping and Fault-Tolerance

P1 P4 P2 P3 m1 m2 m3 m4 P1 P2 P3 P4 N1 N2 40 X 60 60 40 70 X 70 1

N1 N2

P1 N1 N2 TTP P2 P3

S1S2

m2

P4

m4

Best mapping without considering fault-tolerance

Deadline Missed P1 N1 N2 TTP P2 P3

S1S2

P4

m4 m2

slide-13
SLIDE 13

13 of 14 13/21

Mapping and Fault-Tolerance

P1 P4 P2 P3 m1 m2 m3 m4 P1 P2 P3 P4 N1 N2 40 X 60 60 40 70 X 70 1

N1 N2

P1 N1 N2 TTP P2 P3

S1S2

m2

P4

m4

Best mapping without considering fault-tolerance

Deadline Missed P1 N1 N2 TTP P2 P3

S1S2

P4

m4 m2

P1 N1 N2 TTP P2 P3

S1S2

m4 m2

P4 Deadline Met

Simultaneous mapping and fault-tolerance

slide-14
SLIDE 14

14 of 14 14/21

Optimization Strategy

  • Design optimization:
  • Fault-tolerance policy assignment
  • Mapping of processes and messages
  • Root schedules
  • Three tabu-search optimization algorithms:
  • 1. Mapping and Fault-Tolerance Policy assignment (MRX)
  • Re-execution, replication or both
  • 2. Mapping and only Re-Execution (MX)
  • 3. Mapping and only Replication (MR)

Tabu-search List scheduling

slide-15
SLIDE 15

15 of 14 15/21

MRX Tabu-Search Example

P1 P2 P3 N1 N2 40 50 60 60 75 75 P4 40 50 1 N1 N2

P1 P4 P2 P3 m1 m2 m3

N1 N2 TTP P1 P3 S1S2 S2 P4

m2

P2 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1

Current solution Design transformations

N1 N2 TTP P1 P3 S1S2 P4 P2

m1

1 1 1 Wait 2 1 Tabu P4 P3 P2 P1 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1

Tabu move & worse than best-so-far

S1

slide-16
SLIDE 16

16 of 14 16/21

MRX Tabu-Search Example

P1 P2 P3 N1 N2 40 50 60 60 75 75 P4 40 50 1 N1 N2

P1 P4 P2 P3 m1 m2 m3

N1 N2 TTP P1 P3 S1S2 S2 P4

m2

P2 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1

Current solution Design transformations

N1 N2 TTP P1 P3 S1S2 P4 P2

m1

1 1 1 Wait 2 1 Tabu P4 P3 P2 P1 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1

Tabu move & worse than best-so-far

S1 N1 N2 P1 P3 S1S2 P4 P2 P1

m2 m1

TTP 1 2 Wait 1 2 Tabu P4 P3 P2 P1 1 2 Wait 1 2 Tabu P4 P3 P2 P1

Tabu move & better than best-so-far

slide-17
SLIDE 17

17 of 14 17/21

MRX Tabu-Search Example

P1 P2 P3 N1 N2 40 50 60 60 75 75 P4 40 50 1 N1 N2

P1 P4 P2 P3 m1 m2 m3

N1 N2 TTP P1 P3 S1S2 S2 P4

m2

P2 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1

Current solution Design transformations

N1 N2 TTP P1 P3 S1S2 P4 P2

m1

1 1 1 Wait 2 1 Tabu P4 P3 P2 P1 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1

Tabu move & worse than best-so-far

S1 N1 N2 P1 P3 S1S2 P4 P2 P1

m2 m1

TTP 1 2 Wait 1 2 Tabu P4 P3 P2 P1 1 2 Wait 1 2 Tabu P4 P3 P2 P1

Tabu move & better than best-so-far

N1 N2 TTP P1 P3 S1S2 S2 P4

m2

P2 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1

Non-tabu & worse than best-so-far

slide-18
SLIDE 18

18 of 14 18/21

MRX Tabu-Search Example

P1 P2 P3 N1 N2 40 50 60 60 75 75 P4 40 50 1 N1 N2

P1 P4 P2 P3 m1 m2 m3

N1 N2 TTP P1 P3 S1S2 S2 P4

m2

P2 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1

Current solution Design transformations

N1 N2 TTP P1 P3 S1S2 P4 P2

m1

1 1 1 Wait 2 1 Tabu P4 P3 P2 P1 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1

Tabu move & worse than best-so-far

S1 N1 N2 P1 P3 S1S2 P4 P2 P1

m2 m1

TTP 1 2 Wait 1 2 Tabu P4 P3 P2 P1 1 2 Wait 1 2 Tabu P4 P3 P2 P1

Tabu move & better than best-so-far

N1 N2 TTP P1 P3 S1S2 S2 P4

m2

P2 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1

Non-tabu & worse than best-so-far

N1 N2 TTP P1 P3 S1S2 S2 P4

m2

P2 P3 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1

Non-tabu & worse than best-so-far

slide-19
SLIDE 19

19 of 14 19/21

MRX Tabu-Search Example

P1 P2 P3 N1 N2 40 50 60 60 75 75 P4 40 50 1 N1 N2

P1 P4 P2 P3 m1 m2 m3

N1 N2 TTP P1 P3 S1S2 S2 P4

m2

P2 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1

Current solution Design transformations

N1 N2 TTP P1 P3 S1S2 P4 P2

m1

1 1 1 Wait 2 1 Tabu P4 P3 P2 P1 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1

Tabu move & worse than best-so-far

S1 N1 N2 P1 P3 S1S2 P4 P2 P1

m2 m1

TTP 1 2 Wait 1 2 Tabu P4 P3 P2 P1 1 2 Wait 1 2 Tabu P4 P3 P2 P1

Tabu move & better than best-so-far

N1 N2 TTP P1 P3 S1S2 S2 P4

m2

P2 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1

Non-tabu & worse than best-so-far

N1 N2 TTP P1 P3 S1S2 S2 P4

m2

P2 P3 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1 1 1 1 Wait 2 1 Tabu P4 P3 P2 P1

Non-tabu & worse than best-so-far

N1 N2 P1 P3 S1S2 P4 P2 P1

m2 m1

TTP 1 2 Wait 1 2 Tabu P4 P3 P2 P1 1 2 Wait 1 2 Tabu P4 P3 P2 P1

Current solution Design transformations

slide-20
SLIDE 20

20 of 14 20/21

80 20

Experimental Results

10 30 40 50 60 70 90 100 20 40 60 80 100 80

Mapping and replication (MR)

20

Mapping and re-execution (MX) Mapping and policy assignment (MRX)

Number of processes Avgerage % deviation from MRX Schedulability improvement under resource constraints

Case study

Vehicle cruise controller MRX: schedulable fault-tolerant application with 65% overhead

slide-21
SLIDE 21

21 of 14 21/21

Contributions and Message

Contributions

Combined re-execution and replication Optimization algorithms for fault-tolerance policy assignment Efficient contingency schedule generation

Optimization of fault-tolerance policy assignment needed for cost-effective fault tolerance