SLIDE 1
Mixed Criticality A Personal View Alan Burns Contents Some - - PowerPoint PPT Presentation
Mixed Criticality A Personal View Alan Burns Contents Some - - PowerPoint PPT Presentation
Mixed Criticality A Personal View Alan Burns Contents Some discussion on the notion of mixed criticality A brief overview of the literature An augmented system model Open issues (as I see it) What is Criticality A
SLIDE 2
SLIDE 3
What is Criticality
A classification primarily based on the consequences and likelihood of failure
- Wrong/late/missing output
- HAZOPS
Dictates the procedures required in the design, verification and implementation of the code Dictates the level of hardware redundancy Has enormous cost implications
SLIDE 4
What Mixed Criticality is
A means of dealing with the inherent uncertainty in a complex system A means of providing efficient resource usage in the context of this uncertainty A means of protecting the more critical work when faults occur
- Including where assumptions are violated (rely
conditions are false)
- Note some tasks are fail-stop/safe, others are fail-
- perational – regardless of criticality
SLIDE 5
What Mixed Criticality isn’t
Not a mixture of hard and soft deadlines Not a mixture of critical and non-critical Not (only) delivering isolation and non- interference Not dropping tasks to make a system schedulable
SLIDE 6
WCET – a source of uncertainty
We know that WCET cannot be known with certainty All estimates have a probability of being wrong (too low) But all estimates are attempting to be safe (pessimistic) In particular C(LO) is a valid engineered estimate with the belief that C(LO) > WCET
SLIDE 7
Events
An event driven system must make assumptions about the intensity of the events Again this cannot be known with certainty So Load parameters need to be estimated (safely) In particular T(LO) < T(real)
SLIDE 8
Fault Tolerance
Critical systems need to demonstrate survivability Faults will occur – and some level must be tolerated Faults are not independent Faults might relate to the assumptions upon which the verification of the timing behaviour of the system was based
- E.g. WCET, arrival rates, battery power
SLIDE 9
Fault Models
Fault models gives a means of assessing/ delivering survivability
- Full functional behaviour with a certain level of faults
- Graceful Degradation for more severe faults
- Graceful Degradation is a controlled reduction in
functionality, aiming to preserve safety
For example:
- If any task executes for more than C(LO) and all HI-
criticality tasks execute for no more than C(HI) then it can be demonstrated that all HI-criticality tasks meet their deadlines
SLIDE 10
Graceful Degradation
As a strategy for Graceful Degradation a number of schemes in MCS literature have been proposed:
- Drop all lower critical work
- Drop some, using notions of importance etc.
- Extend periods (elastic task model)
- Reduce functionality within low and high crit work
The strategy should extend to issues concerning C(HI) bound also being wrong
SLIDE 11
Graceful Degradation
If tasks are dropped/aborted then this cannot be arbitrary – the approach must be related back to the software architecture / task dependencies
- Use of fault-trees perhaps
Recovery must also relate to the needs of the software (e.g. dealing with missing/ stale state) Normal behaviour should be called that, normal, not LO-criticality mode
SLIDE 12
Fault Recovery
After a fault, and degraded functionality it should be possible for the system to return to full functionality
- A 747 can fly with 3 engines, but its nice to get the 4th
- ne back!
This can be within the system model Or outside (cold/warm restart)
- Typical with hardware redundancy
SLIDE 13
Existing Literature
Since Vestal’s paper there has been at least 180 articles published (one every 2 weeks!) I hope you are all familiar with the review from York (updated every 6 months and funded by the MCC project)
- www-user.cs.york.ac.uk/~burns/
Some top level observations follow
SLIDE 14
Observations
For uniprocessors:
- For FPS, AMC seems to be the ‘standard’ approach
- For EDF, schemes that have a virtual deadline for the
HI-crit tasks seem to be standard
- Server based schemes have been revisited
- Not too much work on the scheduling schemes
actually used in safety-critical systems, e.g. cyclic executives and non-preemptive (or cooperative) FPS
SLIDE 15
Observations
For multiprocessor systems there are a number of schemes (extensions from uni- criticality systems) Similarly for resource sharing protocols Work on communications is less well represented Lots of work on graceful degradation On allocation – ‘to separate or integrate, that is the question’
SLIDE 16
Observations
Almost all papers stick to just two criticality levels
- But LO-crit does not mean no-crit
Some pay lip service to multiple levels What is the model we require for, say, 4 or 5 levels? It does not seem to make sense to have five estimates of WCET
SLIDE 17
Observations
Little on linking to fault tolerance in general Little work on probabilistic assessment of uncertainty Some implementation work, but not enough Some comparative evaluations, but not enough Good coverage of formal issues such as speed-up factors
SLIDE 18
Augmented Model
Four criticality levels (a,b,c,d) plus a non- critical level (e) How many estimates of WCET? I feel a sufficiently expressive model can be obtained by only having two levels, C(normal) and C(self) So tasks of crit d just have C(normal) Task of crit c have C(self) and C(normal) Tasks of crit b have C(self), C(normal), C(normal)
SLIDE 19
Augmented Model
All guarantees are met with C(normal)s No tasks can execute for more than its C(self)
- Run-time monitoring required
- Mode change giving more time is possible
If a task of crit b, say, exceeds its C(normal) then it must remain schedulable if it uses up to C(self), crit a tasks use C(normal) and no other tasks need to be guaranteed
SLIDE 20
Open Issues
- 1. As well as looking at mixing criticality
levels within a single scheduling scheme (e.g. different priorities within FPS) we need to look at integrating different schemes (e.g. Cyclic Executives for safety-critical, FPS for mission critical –
- n same processor)
- 2. More work is needed to integrate the run-
time behaviour (monitoring and control) with the assumptions made during static verification
SLIDE 21
Open Issues
- 3. We need to be more holistic in terms of
ALL system resources (especially communications media)
- 4. There are a number of formal aspects of
scheduling still to be investigated (we should not apologies for finding the research in this area fascinating)
SLIDE 22
Open Issues
- 5. We need to be sure that techniques scale
to at least 5 levels of criticality
- 6. There are still a number of open issues
with regard to graceful degradation and fault recovery
- 7. There is little work as yet on security as
an aspect of criticality
- 8. We need protocols for information
sharing between criticality levels
SLIDE 23
Open Issues
- 9. We need better WCET analysis to reduce
the (safe) C(HI) and C(LO) values
- 10. We should look to have an impact on the
Standards relevant to the application domains we hope to influence
- 11. Better models for system overheads and
task dependencies
- 12. How many criticality levels to support?
SLIDE 24
Open Issues
- 13. We do not as yet have the structures
(models, methods, protocols, analysis etc) that allow tradeoffs between sharing and separation to be evaluated
SLIDE 25