mixed criticality systems (Experience from MultiPARTES, DREAMS and - - PowerPoint PPT Presentation

mixed criticality systems
SMART_READER_LITE
LIVE PREVIEW

mixed criticality systems (Experience from MultiPARTES, DREAMS and - - PowerPoint PPT Presentation

Road to certification in multicore partitioned mixed criticality systems (Experience from MultiPARTES, DREAMS and PROXIMA FP7) Dr. Jon Perez Dr. Alfons Crespo November 04 th , 2014. Berlin. AUTOSAR Working Group Agenda Introduction


slide-1
SLIDE 1

Road to certification in multicore partitioned mixed criticality systems

(Experience from MultiPARTES, DREAMS and PROXIMA FP7)

November 04th, 2014. Berlin. AUTOSAR Working Group

  • Dr. Jon Perez
  • Dr. Alfons Crespo
slide-2
SLIDE 2

Agenda

  • Introduction

– IKERLAN – FENTISS – Road to certification – Basic concepts

  • Industrial domain (IEC-61508)

– Dissemination – Safety concept – What we have learnt

  • Space domain

– Dissemination – Qualification – What we have learnt

  • Achievements are the foundation for…

– FP7 PROXIMA – FP7 DREAMS

  • Questions

2

slide-3
SLIDE 3
  • “Modern electronic systems used in industry (avionics, automotive, etc)

combine applications with different security, safety, and real-time requirements. Systems with such mixed requirements are often referred to as mixed-criticality systems“ [Baumann, 2011]

  • “The integration of applications of different criticality (safety, security, real-time and

non-real time) in a single embedded system is referred as mixed-criticality system” [Perez, 2014]

Introduction – Mixed-Criticality

Introduction

slide-4
SLIDE 4

Road to certification

8 Introduction

slide-5
SLIDE 5

Basic concepts – Different terms

9

Academia

Temporal isolation, safety, safety-critical, ….

Industry IEC-61508

Fail-safe / operational Temporal Independence Compliant Item High Demand Diagnostic Coverage

…. ….

Introduction

slide-6
SLIDE 6

INDUSTRIAL DOMAIN (IEC-61508)

Industrial Domain (IEC-61508)

slide-7
SLIDE 7

Dissemination

  • Academic / Scientific:

– Perez, Jon, David Gonzalez, Carlos Fernando Nicolas, Ton Trapman and Jose Miguel Garate. "A Safety Certification Strategy for IEC-61508 Compliant Industrial Mixed- Criticality Systems Based on Multicore Partitioning." Euromicro DSD/SEAA Verona, Italy, (2014).

  • Industry:

– Perez, Jon, David Gonzalez, Salvador Trujillo, Anton Trapman and Jose Miguel Garate. "A Safety Concept for a Wind Power Mixed-Criticality Embedded System Based

  • n Multicore Partitioning." In Functional Safety in Industry

Application, 11th International TÜV Rheinland Symposium,

  • 36. Cologne, Germany, 2014

11 Industrial Domain (IEC-61508)

slide-8
SLIDE 8

Safety Concept

  • The safety concept was positively assessed by

TÜV Rheinland, a relevant certification body in the industrial domain. Goals:

– The review of a safety-concept for a wind power case- study, which serves as a representative proof of concept example to discuss the MultiPARTES contribution and limitations / comments that should be taken into account in a future certification process. – The dissemination of MultiPARTES contribution to TÜV Rheinland – The gathering of detailed feedback from TÜV Rheinland – The definition of an action plan based on the feedback (if needed)

12 Industrial Domain (IEC-61508)

slide-9
SLIDE 9

Introduction – Context Diagram

13

Windpark Control Center WebHMI Maintenance SCADA Client SCADA WT Heterogeneous Processing Unit

Safety Supervision HMI & Comms

Developer Maintenance Operator Park Client

I/O I/O I/O I/O

WT Heterogeneous Processing Unit

Industrial Domain (IEC-61508)

slide-10
SLIDE 10

Introduction - Off-shore WT

  • A modern off-shore wind turbine dependable control system

manages [1]:

– I/Os: up to three thousand inputs / outputs – Function & Nodes: several hundreds of functions distributed over several hundred of nodes – Distributed: grouped into eight subsystems interconnected with a fieldbus – Software: several hundred thousand lines of code

[1] Perez, Gonzalez et al.: "A safety concept for a wind power mixed-criticality embedded system based on multicore partitioning". Real Time Systems Symposium (RTSS) - MCS Workshop Vancouver, December 2013

Industrial Domain (IEC-61508)

slide-11
SLIDE 11

Introduction – Context Diagram

ETHERCAT

Safety Non Safety Related

HMI & COMS Supervision Safety Protection Speed Sensor (s) Sensor (s) Activators Subsystems < Safety Chain > Safety Relay Output relay pitch control

Industrial Domain (IEC-61508)

slide-12
SLIDE 12

Introduction – Proposed Solution

ETHERCAT

Safety Non Safety Related

HMI & COMS Speed Sensor (s) Sensor (s) Activators Subsystems Safety Relay Safety Protection Supervision < Safety Chain > Output relay pitch control

Industrial Domain (IEC-61508)

slide-13
SLIDE 13

Safety Concept - Requirements

ID Requirement SR_WT_4 The <Protection System> safety function must activate the “safe state” if the “rotation speed” exceeds the “maximum rotation speed” SR_WT_5 The <Protection System> safety function must ensure “safe state” during system initialization (prior to the running state where rotation speeds are compared) SR_WT_6 <Protection System> safety function must be provided with a SIL3 integrity level (IEC-61508). SR_WT_7 The safe state is the de-energization of output “safety relay(s)” SR_WT_8 Output “safety relay(s)” is(/are) connected in serial within the safety chain. SR_WT_9 A single fault does not lead to the loss of the safety function: HFT=1 and Diagnostic Coverage (DC) of the system >= 90% (according to IEC-61508). SR_WT_10 The reaction time must not exceed PST (SW_WT_14) SR_WT_11 Detected ‘severe errors’ lead to a “safe state” in less than PST (SW_WT_14). SR_WT_12 The “rotation speed” absolute measurement error must be equal or below 1 rpm to be used by <Protection System>. If measurement error ≥ 1 rpm it must be neglected. SR_WT_13 The “Maximum Rotation Speed” must be configurable only during start-up (not running). SR_WT_14 The Process Safety Time (PST) is 2 seconds.

Industrial Domain (IEC_61508) - Safety Concept

slide-14
SLIDE 14

Safety Concept – The approach…

DUAL PROCESSOR – 1oo2

SINGLE PROCESSOR – 1oo2, partitioned, heterogeneous quad-core

  • Safety concept based on ‘common

practice in industry’

  • Serves as a reference, not detailed
  • Analogous safety concept using

heterogeneous multicore and hypervisor

  • The MultiPARTES contribution

Industrial Domain (IEC_61508) - Safety Concept

slide-15
SLIDE 15

Safety Concept (A – ‘Traditional’)

DUAL-PROCESSOR – 1oo2

Supervision ETHERCAT Safety Relay Speed Sensor (s) Safety Protection

P0 P1

WDG

HMI COM SERVER DIAG Safety Protectio n

P0

Safety Relay

SCPU

DIAG

WDG

P0

Safety techniques (IEC-61508 SIL3):

  • 1oo2
  • HFT=1 and DC >= 90%
  • Dual diverse sensors
  • Dual independent safety relays connected

in serial

  • Dual Diverse Processors:
  • ‘P0’ safety functions only
  • ‘P1’ mixed functionalities
  • ‘P0/P1’ independent safety relay
  • Local

diagnosis and reciprocal comparison by software (‘P0/P1’)

  • Communication: EtherCAT and ‘safety
  • ver EtherCAT’

Industrial Domain (IEC_61508) - Safety Concept

slide-16
SLIDE 16

Safety Concept (A – ‘Traditional’)

DUAL-PROCESSOR – 1oo2

Supervision ETHERCAT Safety Relay Speed Sensor (s) Safety Protection

P0 P1

WDG

HMI COM SERVER DIAG Safety Protection

P0

Safety Relay

SCPU

DIAG

WDG

P0

Scalability limitations:

  • The number of functionalities continues to

increase (real-time, safety and non-safety)

  • Usage of fan not allowed (reliability issue)
  • ‘P1’ Processor performance capability

reaches a limit..... Industrial Domain (IEC_61508) - Safety Concept

slide-17
SLIDE 17

Safety Concept (A – ‘Traditional’)

N PROCESSOR – 1oo2

P0 P1 P3

ETHERCAT Speed Sensor (s)

P0

SCPU

Safety Relay

WDG

Safety Relay COM SERVER HMI DIAG Safety Protection DIAG

WDG

P0

Safety Protection

RT Control

P2

Supervision

Increased Scalability:

  • Add additional processors (P2, P3, etc.) to

provide required computation performance Reduced Reliability:

  • The
  • verall

system reliability and availability is reduced.... Industrial Domain (IEC_61508) - Safety Concept

slide-18
SLIDE 18

Safety Concept (B – ‘Multicore partitioning’)

  • The fault-hypothesis [1] of this strategy consists of the following

assumptions:

– FSM: All safety relevant systems are developed with an IEC-61508 Functional Safety Management (FSM) – Node: The node computer forms a single Fault- Containment Region (FCR) that can fail in an arbitrary failure mode. The permanent failure rate is assumed to be in the order of 10-100 FIT and the transient failure rate is assumed to be in the order of 100.000 FIT – Processor: The multicore processor might not provide temporal isolation (or not sufficient evidence for certification), but bounded temporal interference can be estimated and validated with measurements – Hypervisor: The hypervisor provides interference freeness among partitions (bounded time and spatial isolation), it is a compliant item and fails in an arbitrary failure mode when it is affected by a fault. Qualified tools. – Partition: A partition can fail in an arbitrary failure mode, both in the temporal as well as the spatial domain

[1] H. Kopetz, On the Fault Hypothesis for a Safety-Critical Real-Time System, ser. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2006, vol. 4147, ch. 3, pp. 31–42.

Industrial Domain (IEC_61508) - Safety Concept

slide-19
SLIDE 19

Safety Concept (B – ‘Multicore partitioning’) 1/3

PARTITIONED

ETHERCAT Speed Sensor (s)

SCPU

Safety Relay Safety Protection DIAG

WDG WDG

DIAG Safety Protection Safety Relay COM SERVER HMI

P0 P0

Supervision

Processor + Hypervisor Is it feasible to developed a ‘partitioned’ solution?:

  • Usage of a certifiable hypervisor
  • System partitioning (safety, real-time and

non real-time partitions)

  • Interference

freeness

  • f

non-safety partition with safety partitions, and lower criticality levels with higher criticality levels Industrial Domain (IEC_61508) - Safety Concept

slide-20
SLIDE 20

Safety Concept (B – ‘Multicore partitioning’) 2/3

LEON3 FT + HYPERVISOR X86 + HYPERVISOR X86 + HYPERVISOR

ETHERCAT Speed Sensor (s)

P0

SCPU

Safety Relay

WDG

Safety Relay COM SERVER HMI DIAG Safety Protection DIAG

WDG

P0

Safety Protection

Supervision

LEON3 FT + HYPERVISOR

Supervision

Processor ‘Partitions’ mapped to a multicore processor:

  • Heterogeneous quad core
  • Dual diverse cores for safety partitions
  • Partitioning and multicore allocation

enables resource usage and performance maximization while ensuring interference freeness

SAFETY CPU SINGLE PROCESSOR QUAD CORE PARTITIONED – 1oo2

Industrial Domain (IEC_61508) - Safety Concept

slide-21
SLIDE 21

Safety Concept (B – ‘Multicore partitioning’) 3/3

x86 + Hypervisor x86 + Hypervisor

ETHERCAT Speed Sensor (s) Safety Relay

WDG

Safety Relay COM SERVER HMI DIAG Safety Protection

WDG

P0

Supervision

Processor

External Shared Memory External Shared Memory 2

CLK WD_B CLK Watchdog Device L2 Cache L1 Cache L1 Cache Core Device CLK WD_A Watchdog Device IO Device IO Device

P0

PCIe

SCPU

GW AHB/PCIe

AHB BUS

Periodic Interrupt

RT Control

LEON3 FT + Hypervisor

LS MEM

LEON3 FT + Hypervisor

Safety Protection DIAG

LS MEM

SAFETY CPU SINGLE PROCESSOR QUAD CORE PARTITIONED – 1oo2

Industrial Domain (IEC_61508) - Safety Concept

slide-22
SLIDE 22

Safety Concept (B – ‘Multicore partitioning’)

  • Scheduling:

– Static cyclic scheduling algorithm – pre-assigned guaranteed time slots – defined at design time – synchronized based on the global notion of time

  • Diagnosis:

– The partition should be self contained and should provide safety life- cycle related techniques and platform independent diagnosis abstracted from the details of the underlying platform – The hardware provides autonomous diagnosis and diagnosis components to be commanded by software – The hypervisor and associated diagnosis partitions should support platform related diagnosis – The system architect specifies and integrates additional diagnosis partitions required to develop a safe product taking into consideration all safety manuals

[1] H. Kopetz, On the Fault Hypothesis for a Safety-Critical Real-Time System, ser. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2006, vol. 4147, ch. 3, pp. 31–42.

slide-23
SLIDE 23

Safety Concept (B – ‘Multicore partitioning’)

  • Safety techniques:

– Measures to reduce the probability of systematic faults

  • The overall system is developed and certified using a

SIL3 FSM compliant with IEC-61508.

  • The hypervisor is a compliant item
  • Qualified tools according to IEC-61508-3 (see chapter

7.4.4)

– Detailed FMEAs, measures to control errors and system reaction to errors

Industrial Domain (IEC_61508) - Safety Concept

slide-24
SLIDE 24

Safety Concept (B – ‘Multicore partitioning’) 3/3

x86 + Hypervisor x86 + Hypervisor

ETHERCAT Speed Sensor (s) Safety Relay

WDG

Safety Relay COM SERVER HMI DIAG Safety Protection

WDG

P0

Supervision

Processor

External Shared Memory External Shared Memory 2

CLK WD_B CLK Watchdog Device L2 Cache L1 Cache L1 Cache Core Device CLK WD_A Watchdog Device IO Device IO Device

P0

PCIe

SCPU

GW AHB/PCIe

AHB BUS

Periodic Interrupt

RT Control

LEON3 FT + Hypervisor

LS MEM

LEON3 FT + Hypervisor

Safety Protection DIAG

LS MEM

SAFETY CPU SINGLE PROCESSOR QUAD CORE PARTITIONED – 1oo2

Industrial Domain (IEC_61508) - Safety Concept

slide-25
SLIDE 25

Lessons learnt

  • Mixed-criticality paradigm based on COTS multicore and partitioning

provides multiple potential benefits but certification is a challenge

  • It is possible…. to achieve SIL3 IEC-61508 / Pld ISO-13849 with

multicore and partitioning

  • Temporal independence (IEC-61508):

– Temporal isolation simplifies the safety argumentation but …. Temporal independence does not necessarily require temporal isolation – Temporal independence must be met according to IEC-61508. The lack of temporal isolation could reduce the availability of the system but should not jeopardize safety (fault avoidance and control)

  • The safety-concept highly depends on the details of the underlying

processor

  • The assumptions and analysis considered at this stage will be

reviewed in the following design stages and validated at the final stage of the case-study.

29 Industrial Domain (IEC_61508) - Safety Concept

slide-26
SLIDE 26

SPACE DOMAIN (ECSS)

slide-27
SLIDE 27

Space application assessment

  • Space applications are qualified using ECSS-E-ST-40 and ECSS-Q-ST-80

standards

  • E-40 defines software engineering requirements for space software
  • systems. The standard has a similar view as ISO/IEC 12207, using an

approach to software development based on a defined set of processes.

  • Q-80 defines product assurance requirements for developing software for

space systems

  • ECSS standards partially implemented:

– Requirements, design, implementation, V&V processes used in virtualization layer, guest-OS & most of application development – Quality requirements being considered in some parts of the system

Space Domain (ECSS)

slide-28
SLIDE 28

Space application assessment

  • Current achievements

– A technological assessment of XtratuM has been carried out by ‘Softwcare’ (ESA contract)

  • Roadmap for XtratuM qualification as defined

– The ORK OS has been qualified to level B for space applications – Model-based techniques used for application design are consistent with ESA practice (e.g. Matlab/Simulink) – Schedulability analysis as per E-40 requirements for level-B software

  • ‘Softwcare’ working on extended assessment of the case study:

– advice on roadmap to qualification – assessment on the use of safety mechanisms and conformity to E- 40/Q-80

Space Domain (ECSS)

slide-29
SLIDE 29

ACHIEVEMENTS FOUNDATION FOR (OTHER PROJECTS)

slide-30
SLIDE 30

Results used in other projects….

  • FP7 DREAMS

– Modular safety cases (hypervisor, COTS processor, partition, system) – Certification of product lines (variability) – Updated wind turbine safety concept

  • FP7 PROXIMA

– Safety concept, SIL4 railway signalling EN-5012X

slide-31
SLIDE 31

Questions

35