Mixed Criticality A Personal View Alan Burns Contents Some - - PowerPoint PPT Presentation

mixed criticality a personal view
SMART_READER_LITE
LIVE PREVIEW

Mixed Criticality A Personal View Alan Burns Contents Some - - PowerPoint PPT Presentation

Mixed Criticality A Personal View Alan Burns Contents Some discussion on the notion of mixed criticality A brief overview of the literature An augmented system model Open issues (as I see it) What is Criticality A


slide-1
SLIDE 1

Mixed Criticality A Personal View

Alan Burns

slide-2
SLIDE 2

Contents

 Some discussion on the notion of mixed criticality  A brief overview of the literature  An augmented system model  Open issues (as I see it)

slide-3
SLIDE 3

What is Criticality

 A classification primarily based on the consequences and likelihood of failure

  • Wrong/late/missing output
  • HAZOPS

 Dictates the procedures required in the design, verification and implementation of the code  Dictates the level of hardware redundancy  Has enormous cost implications

slide-4
SLIDE 4

What Mixed Criticality is

 A means of dealing with the inherent uncertainty in a complex system  A means of providing efficient resource usage in the context of this uncertainty  A means of protecting the more critical work when faults occur

  • Including where assumptions are violated (rely

conditions are false)

  • Note some tasks are fail-stop/safe, others are fail-
  • perational – regardless of criticality
slide-5
SLIDE 5

What Mixed Criticality isn’t

 Not a mixture of hard and soft deadlines  Not a mixture of critical and non-critical  Not (only) delivering isolation and non- interference  Not dropping tasks to make a system schedulable

slide-6
SLIDE 6

WCET – a source of uncertainty

 We know that WCET cannot be known with certainty  All estimates have a probability of being wrong (too low)  But all estimates are attempting to be safe (pessimistic)  In particular C(LO) is a valid engineered estimate with the belief that C(LO) > WCET

slide-7
SLIDE 7

Events

 An event driven system must make assumptions about the intensity of the events  Again this cannot be known with certainty  So Load parameters need to be estimated (safely)  In particular T(LO) < T(real)

slide-8
SLIDE 8

Fault Tolerance

 Critical systems need to demonstrate survivability  Faults will occur – and some level must be tolerated  Faults are not independent  Faults might relate to the assumptions upon which the verification of the timing behaviour of the system was based

  • E.g. WCET, arrival rates, battery power
slide-9
SLIDE 9

Fault Models

 Fault models gives a means of assessing/ delivering survivability

  • Full functional behaviour with a certain level of faults
  • Graceful Degradation for more severe faults
  • Graceful Degradation is a controlled reduction in

functionality, aiming to preserve safety

 For example:

  • If any task executes for more than C(LO) and all HI-

criticality tasks execute for no more than C(HI) then it can be demonstrated that all HI-criticality tasks meet their deadlines

slide-10
SLIDE 10

Graceful Degradation

 As a strategy for Graceful Degradation a number of schemes in MCS literature have been proposed:

  • Drop all lower critical work
  • Drop some, using notions of importance etc.
  • Extend periods (elastic task model)
  • Reduce functionality within low and high crit work

 The strategy should extend to issues concerning C(HI) bound also being wrong

slide-11
SLIDE 11

Graceful Degradation

 If tasks are dropped/aborted then this cannot be arbitrary – the approach must be related back to the software architecture / task dependencies

  • Use of fault-trees perhaps

 Recovery must also relate to the needs of the software (e.g. dealing with missing/ stale state)  Normal behaviour should be called that, normal, not LO-criticality mode

slide-12
SLIDE 12

Fault Recovery

 After a fault, and degraded functionality it should be possible for the system to return to full functionality

  • A 747 can fly with 3 engines, but its nice to get the 4th
  • ne back!

 This can be within the system model  Or outside (cold/warm restart)

  • Typical with hardware redundancy
slide-13
SLIDE 13

Existing Literature

 Since Vestal’s paper there has been at least 180 articles published (one every 2 weeks!)  I hope you are all familiar with the review from York (updated every 6 months and funded by the MCC project)

  • www-user.cs.york.ac.uk/~burns/

 Some top level observations follow

slide-14
SLIDE 14

Observations

 For uniprocessors:

  • For FPS, AMC seems to be the ‘standard’ approach
  • For EDF, schemes that have a virtual deadline for the

HI-crit tasks seem to be standard

  • Server based schemes have been revisited
  • Not too much work on the scheduling schemes

actually used in safety-critical systems, e.g. cyclic executives and non-preemptive (or cooperative) FPS

slide-15
SLIDE 15

Observations

 For multiprocessor systems there are a number of schemes (extensions from uni- criticality systems)  Similarly for resource sharing protocols  Work on communications is less well represented  Lots of work on graceful degradation  On allocation – ‘to separate or integrate, that is the question’

slide-16
SLIDE 16

Observations

 Almost all papers stick to just two criticality levels

  • But LO-crit does not mean no-crit

 Some pay lip service to multiple levels  What is the model we require for, say, 4 or 5 levels?  It does not seem to make sense to have five estimates of WCET

slide-17
SLIDE 17

Observations

 Little on linking to fault tolerance in general  Little work on probabilistic assessment of uncertainty  Some implementation work, but not enough  Some comparative evaluations, but not enough  Good coverage of formal issues such as speed-up factors

slide-18
SLIDE 18

Augmented Model

 Four criticality levels (a,b,c,d) plus a non- critical level (e)  How many estimates of WCET?  I feel a sufficiently expressive model can be obtained by only having two levels, C(normal) and C(self)  So tasks of crit d just have C(normal)  Task of crit c have C(self) and C(normal)  Tasks of crit b have C(self), C(normal), C(normal)

slide-19
SLIDE 19

Augmented Model

 All guarantees are met with C(normal)s  No tasks can execute for more than its C(self)

  • Run-time monitoring required
  • Mode change giving more time is possible

 If a task of crit b, say, exceeds its C(normal) then it must remain schedulable if it uses up to C(self), crit a tasks use C(normal) and no other tasks need to be guaranteed

slide-20
SLIDE 20

Open Issues

  • 1. As well as looking at mixing criticality

levels within a single scheduling scheme (e.g. different priorities within FPS) we need to look at integrating different schemes (e.g. Cyclic Executives for safety-critical, FPS for mission critical –

  • n same processor)
  • 2. More work is needed to integrate the run-

time behaviour (monitoring and control) with the assumptions made during static verification

slide-21
SLIDE 21

Open Issues

  • 3. We need to be more holistic in terms of

ALL system resources (especially communications media)

  • 4. There are a number of formal aspects of

scheduling still to be investigated (we should not apologies for finding the research in this area fascinating)

slide-22
SLIDE 22

Open Issues

  • 5. We need to be sure that techniques scale

to at least 5 levels of criticality

  • 6. There are still a number of open issues

with regard to graceful degradation and fault recovery

  • 7. There is little work as yet on security as

an aspect of criticality

  • 8. We need protocols for information

sharing between criticality levels

slide-23
SLIDE 23

Open Issues

  • 9. We need better WCET analysis to reduce

the (safe) C(HI) and C(LO) values

  • 10. We should look to have an impact on the

Standards relevant to the application domains we hope to influence

  • 11. Better models for system overheads and

task dependencies

  • 12. How many criticality levels to support?
slide-24
SLIDE 24

Open Issues

  • 13. We do not as yet have the structures

(models, methods, protocols, analysis etc) that allow tradeoffs between sharing and separation to be evaluated

slide-25
SLIDE 25

Conclusion

 We have lots to discuss