Dependability Evaluation Techniques for Dependability Evaluation - - PowerPoint PPT Presentation

dependability evaluation techniques for
SMART_READER_LITE
LIVE PREVIEW

Dependability Evaluation Techniques for Dependability Evaluation - - PowerPoint PPT Presentation

Dependability Evaluation Techniques for Dependability Evaluation The dependability evaluation of a system can be carried out either: experimentally (heuristic) : a system prototype is built and empirical statistical data are used to evaluate


slide-1
SLIDE 1

Dependability Evaluation

slide-2
SLIDE 2

Techniques for Dependability Evaluation

The dependability evaluation of a system can be carried out either:

  • experimentally (heuristic): a system prototype is built and empirical

statistical data are used to evaluate the system’s metrics:

  • by far more expensive and complex than the analytic approach
  • building a system prototype may be impossible
  • experimental evaluation of dependability requires long observation

periods

  • analytical: dependability metrics are obtained by a mathematical

model of the system:

  • mathematical models may not adequately represent the real system’s

strucure or the behavior of its components

  • simulation models may be a complementary helpful tool
slide-3
SLIDE 3

Fundamental Definitions

  • Failure Function Q(t):

– probability that a component fails for the first time in the time interval (0,t) – it’s a cumulative distribution function: Q(t) = 0 for t = 0 0  Q(t)  Q(t + Dt) for Dt  0 Q(t) = 1 for t → +

slide-4
SLIDE 4

Fundamental Definitions (cont’d)

  • Reliability Function R(t):

– probability that a component functions correctly in the time interval (0,t) R(t) = 1 for t = 0 1  R(t)  R(t + Dt) for Dt  0 R(t) = 0 for t → +

R(t) = 1 – Q(t)

slide-5
SLIDE 5

Fundamental Definitions (cont’d)

  • Failure probability density function q(t): it’s the derivative of Q(t)

when this is a continous function:

  • R(t) is continous too and its derivative over time r(t) is equal to:
  • R(t) and Q(t) are experimentally evaluated analyzing the behavior of

a sufficiently large population and determining the failure rate .

  • N: population at time t = 0
  • n(t): correct components at time t

dt t dQ t q ) ( ) ( 

) ( ) ( )) ( 1 ( ) ( ) ( t q dt t dQ dt t Q d dt t dR t r       

N t n t R ) ( ) ( 

slide-6
SLIDE 6

Average Failure Frequency

Average failure frequency during the time interval (t, t + Δt) : Average failure frequency of a single unit in the time interval (t, t + Δ t) :

t t t n t n D D   ) ( ) (

t t t n t n t n D D   ) ( ) ( ) ( 1

slide-7
SLIDE 7

Instantaneous Failure Frequency

If Δt tends to zero each entity at time t is characterized by an instantaneous failure frequency given by: Being : after integration, we obtain the reliability function:

) ( ) ( ) ( t R t dR dt t h   dt t R t dR dt t dR t NR N dt t dNR t NR dt t dn t n t t t n t n t n t h

t

1 ) ( ) ( ) ( ) ( ) ( ) ( 1 ) ( ) ( 1 ) ( ) ( ) ( 1 lim ) (                    D D   

 D

 

t

d h

e t R

) (

) (

 

slide-8
SLIDE 8

MTTF (Mean Time To Failure)

  • Index used to evaluate reliability and other dependability metrics.
  • MTTF (Mean Time To Failure). Expected time before a failure, or expected
  • perational time of a system before the occurrence of the first failure.
  • It can also be calculated (expanding q(t)) as:

being given that h(t) is constant or increases over time.

 ) ( dt t tq MTTF

 

  

   

      ) ( ) ( ) ( ) ( dt t R dt t R t tR dt dt t dR t MTTF

lim ) ( lim

) (

  

  

   d h

te t tR

t t

slide-9
SLIDE 9

Bathtube curve

Failure frequency function Tempo

Wore-out region Early

“infant

mortality”

constant fault freq.

fault

slide-10
SLIDE 10

Failure Frequency Function

  • The first and third region can be excluded assuming to use the

entities after the initial testing period and before their aging time.

  • Hence, the instantaneous fault frequency function can be

assumed constant:

  • Which determines the following

values of the previously introduced expressions:

  ) (t h

t t t t d h

e t q e t r e t Q e e t R

t

     

 

    

        ) ( ) ( 1 ) ( ) (

) (

q (t) t

slide-11
SLIDE 11

Repairable Systems

  • In the case of repairable systems, besides the “fault occurrence” event,

the event “repairing” or “replacement” of the faulty components has to be considered:

  • MTTF Mean Time to Fault
  • MTTR (Mean Time To Repair) iThe average time to repair or replace a

faulty entity 

  • System Availability:
  • MTBF (Mean Time Between Fault) is the average time between two faults,

given by the sum of MTTF and MTTR.

MTTR MTTF MTTF A  

slide-12
SLIDE 12

Cover Factor

  • Conditional probability that, after the occurrence of a failure, the system

returns to function correctly.

  • Measure of the system’s ability to reveal a fault, localize it, contain it and

restore a consistent and error free state

  • For its estimation it’s needed to identify every possible fault, and for each

fault, forecast its frequency and the corresponding cover factor. Limits:

  • Hard to determine the probability of every possible fault
  • Often it is unrealistic to take into account every possibe fault
  • The cover factor is determined considering one fault at a time, whereas
  • ne should keep into account the possibility of multiple concurrent faults.
slide-13
SLIDE 13

Dependability Evalution

  • Dependability evaluation of a complex system can be

performed via either:

COMBINATORIAL MARKOVIAN MODELS MODELS   Combinatorial Methods Markov Processes

  • 1. reliability
  • 1. reliability
  • 2. availability
  • 2. availability
  • 3. security
  • 4. performability
slide-14
SLIDE 14

Combinatorial Models

  • Availability and reliability of computing systems cosiders the system as

composed by a set of interconnected entities.

  • First step: identify availability and reliability of each composing entitiy;
  • Second step: identify the configurations that allow the analyzed system to
  • perate according to the project’s specifications;
  • Third step: identify the relation between the faults of each entity and those of

the whole system.

  • Enitities, in their turn, are made up of components whose dependability

metrics depend on:

– Components’ quality, – Mainteinance policies, – Mutual interconnections

slide-15
SLIDE 15

Interconnections

  • Typical interconnections are:

– Serial – Parallel – TMR – Hybrid M out of N

slide-16
SLIDE 16

Serial Interconnection

  • K entities are serially inteconnected when the functioning of

the system depends on the correct functioning of all the K entities.

  • Given:

– Ri(t) = reliability of each entity – Ai = availability of each entity

  • one can derive the following system wide metrics:

 

 

 

K i i K i i

A A t R t R

1 1

) ( ) (

C1 C2 Ck

slide-17
SLIDE 17

Parallel Interconnection

  • k entities are inteconnected in parallel when the functioning
  • f the system is guaranteed even if just a single entity works.
  • Given:

– Ri(t) = reliability of each entity – Ai = availability of each entity

  • we can derive the following system wide metrics:
  • the system does not work (is unavailable) if all k entities fail

(are unavailable). )) ( 1 ))...( ( 1 ))( ( 1 ( 1 ) (

2 1

t R t R t R t R

K

    

C1 C2 Ck

) 1 )...( 1 )( 1 ( 1

2 1 K

A A A A     

slide-18
SLIDE 18

Parallel Interconnection (cont’d)

  • In the case of entities having the same reliability RC(t) or

availability AC we get that:

R(t) t

1 k=1 k=2 k=3 K C K C

A A t R t R ) 1 ( 1 )) ( 1 ( 1 ) (      

A Ac

1.0 k=1 k=2 k=3 1.0 0.9 0.8 0.7 0.8 0.9 0.7

slide-19
SLIDE 19

TMR Interconnection

  • The system fails or is not available when two entities are

simultaneously faulty/unavailable or when the voter is faulty/unavailable:

 

  VOTER

C C C VOTER C C C

A A A A A t R t R t R t R t R ) 1 ( 3 ) ( )) ( 1 ( ) ( 3 ) ( ) (

2 3 2 3

     

I O

C1 C2 C3 r/n

slide-20
SLIDE 20

Parallel/Serial Interconnections

I O C21 C23 C22 C111 C12 C112 C11 C1 C2

R 11 = R 111 . R 112 R 1 = 1 - (1 - R 11) . (1 - R 12) R 2 = 1 - (1 - R 21) . (1 - R 22) . (1 - R 23) R = R 1 . R 2

slide-21
SLIDE 21

Hybrid M out of N interconnection

  • The system works as long as there are at least M correct entities, namely

at most K = N – M entities fail.

  • Given:

– Ri(t) = reliability of each entity – Ai = availability of each entity

  • ne can derive the following

system wide metrics:

  • Infact, the probability that:

– N entities are correct is: – N-1 entities are correct: – N-2 entities are correct: – N-K entities are correct:

i C i N C K i i C i N C K i

A A i N A t R t R i N t R ) 1 ( )) ( 1 )( ( ) (                    

   

 

)) ( 1 )( (

1

t R t NR

C N C

) (t R N

C

2 2

)) ( 1 )( ( 2 t R t R N

C N C

        

 K C K N C

t R t R K N )) ( 1 )( (         

slide-22
SLIDE 22

Evaluation Examples

  • Let us consider a non-redundant system

composed of 4 serially connected entities:

  • How can I increase the system’s dependability?

O S1 S2 S3 S4 I

) ( ) ( ) ( ) ( ) (

4 3 2 1

t R t R t R t R t R 

4 3 2 1

A A A A A 

slide-23
SLIDE 23

Pair with a duplicated system

O S1 S2 S3 S4 S1 S2 S3 S4 I

2 1 2 1

) 1 ( 1 )) ( 1 ( 1 ) ( A A t R t R

d d

     

slide-24
SLIDE 24

Duplicate Each Component

where:

d d d d d d d d d d

A A A A A t R t R t R t R t R

4 3 2 1 2 4 3 2 1 2

) ( ) ( ) ( ) ( ) (  

2 2

) 1 ( 1 )) ( 1 ( 1 ) (

i id i id

A A t R t R      

S1 S1 S2 S4 S3 S2 S4 S3 I

slide-25
SLIDE 25

Quantifying the dependability

  • f the considered configurations
  • Assuming, e.g., that each Ai = 0,9, the

system’s availability in the three cases is, respectively:

– A = 0,6561 – Ad1 = 0,8817 – Ad2 = 0,9606