A Multi-Level Meta-Object Protocol for Fault- Tolerance in Complex - - PowerPoint PPT Presentation

a multi level meta object protocol for fault tolerance in
SMART_READER_LITE
LIVE PREVIEW

A Multi-Level Meta-Object Protocol for Fault- Tolerance in Complex - - PowerPoint PPT Presentation

A Multi-Level Meta-Object Protocol for Fault- Tolerance in Complex Architectures Franois Taani ( * ) , Jean-Charles Fabre, Marc-Olivier Killijian LAAS-CNRS ( ( * ): Now at Lancaster University) DSN'2005, The International Conference on


slide-1
SLIDE 1

François Taïani(*), Jean-Charles Fabre, Marc-Olivier Killijian LAAS-CNRS ((*): Now at Lancaster University)

A Multi-Level Meta-Object Protocol for Fault- Tolerance in Complex Architectures

DSN'2005, The International Conference on Dependable Systems and Networks, Yokohama, Japan, June 28 - July 1, 2005

slide-2
SLIDE 2

2

Motivating Example: Replication & Multithreading

 Goal: Transparent replication of a CORBA server

 multi-layer: POSIX (OS) + CORBA (middleware)  multithreaded: concurrent processing of requests  thread pool: upper limit on concurrency

 Problem 1: state capture / restoration

 application state  middleware + OS state replication CORBA OS

slide-3
SLIDE 3

3

Motivating Example: Replication & Multithreading

 Goal: Transparent replication of a CORBA server

 multi-layer: POSIX (OS) + CORBA (middleware)  multithreaded: concurrent processing of requests  thread pool: upper limit on concurrency

 Problem 1: state capture / restoration

 application state  middleware + OS state

 Problem 2: control of non-determinism

 assumption: multi-threading only source of non-determinism  how to replicate non-deterministic mutex decisions? replication CORBA OS

slide-4
SLIDE 4

4

network

Enforcing Determinism: OS Only

FT OS OS FT

middleware middleware application application

up to 203 synch. operations per request in middleware

(ORBacus) [TAO: 52, omniORB: 64]

 The same lock allocation can be enforced on all replicas.

 All replicas reach the same state.  Only a small subset of the lock allocations impacts determinism.

Replication of every non-deterministic decision  highly inefficient

slide-5
SLIDE 5

5

Only 3 of the synch. operations made by the middleware need to be replicated (ORBacus). Reification of application & middleware activity

Smart Multi-Level Reflection

FT OS OS FT middleware middleware application application

 With middleware and application semantics:

 OS-level actions can be given a higher level semantic.  This semantic allows optimal use of OS level reflection.

Combining information obtained at different levels greatly increases the efficiency of crosscutting mechanisms.

(Here: only 1.5% of MD synch. activity actually needs to be replicated.)

slide-6
SLIDE 6

6

fault-tolerance

family of mechanisms

OS application middleware

generic “glue”

The Vision

MOP

(Meta-Object Protocol)

reflection meta-level base-level meta-interfaces

slide-7
SLIDE 7

7

OS application middleware OS application fault-tolerance

The Problem

?

How to design & implement such a meta-object protocol?

slide-8
SLIDE 8

8

Outline

 Motivating Example: Reflection and Replication  A New Multi-Level MOP: Concepts & Design  Practical Application: CORBA & Linux

slide-9
SLIDE 9

9

Implementing Multi-Level Reflection

 Goal: To provide a multi-reflective framework for the fault-

tolerance of complex, non-reflective industrial platforms

 Challenges:

 Requirements: What kind of information is needed for fault

tolerant mechanisms? Where should this information be found?

 Design: How to design a multi-level meta-object protocol

that supports multi-level reflection?

 Instrumentation: How to instrument an industrial, non-

reflective platform in a non-invasive, transparent way?

slide-10
SLIDE 10

10

Implementing Multi-Level Reflection

 Goal: To provide a multi-reflective framework for the fault-

tolerance of complex, non-reflective industrial platforms

 Challenges:

 Requirements: What kind of information is needed for fault

tolerant mechanisms? Where should this information be found?

 Design: How to design a multi-level meta-object protocol

that supports multi-level reflection?

 Instrumentation: How to instrument an industrial, non-

reflective platform in a non-invasive, transparent way?

slide-11
SLIDE 11

11

Requirements

interface MetaRequestLifecycle { /** Communication **/ requestHasBeenReceived (RequestID); replyHasBeenSent (RequestID); /** Control Path **/ requestBeforeApplication (RequestID); requestAfterApplication (RequestID); /** Synchronisation **/ requestBeforeContentionPoint (RequestID, RequestContentionPoint); requestAfterContentionPoint (RequestID, RequestContentionPoint); };

fault-tolerance

MOP

 Meta-interface for non-determinism [DSN-2003]

slide-12
SLIDE 12

12

Requirements

fault-tolerance meta-model

 Multi-level nature of the meta-interface

request reception request in application sending of reply Request Contention Point (locks) Request after Application Request before Application request pre-processing request post-processing

...

Reception Start Reception End Reply End Reply Start

... ...

Appli. Middleware OS

slide-13
SLIDE 13

13

Implementing Multi-Level Reflection

 Goal: To provide a multi-reflective framework for the fault-

tolerance of complex, non-reflective industrial platforms

 Challenges:

 Requirements: What kind of information is needed for fault

tolerant mechanisms? Where should this information be found?

 Design: How to design a multi-level meta-object protocol

that supports multi-level reflection?

 Instrumentation: How to instrument an industrial, non-

reflective platform in a non-invasive, transparent way?

slide-14
SLIDE 14

14

Semantics and Architecture

 Motivating Example: middleware non-determinism

 request contention points (mutex operations) must be

intercepted at OS level

 but not all mutex operations (otherwise highly inefficient)  question: How to distinguish between mutexes that are

relevant and those that are not?

 Proposal: use of semantic context

 We need to understand the purpose of OS level mutex

  • perations in the more general context of the whole system

activity

Approach: to trace the computation process that results

in a low level OS operation being called

slide-15
SLIDE 15

15

 To trace semantic contexts, a mechanism is needed to

transport information between different abstraction levels

(software layers)

 A mechanism encountered in plants: in periods of droughts

the root system communicates with the foliage using dedicated chemical substances call phytohormones

Meta-markers

 Phytohormones travel through the sap  Design based on this metaphore.

 Sap = threads  Phytohormones = metamarkers

no water

slide-16
SLIDE 16

16

Inter-Level Communication with Meta-Markers

meta-level base level higher level lower level

dormant meta-marker is attached to thread interception thread execution path meta-marker gets activated and modifies low level system behaviour meta-marker remains transparent

slide-17
SLIDE 17

17

Using Meta-Markers for MOP Design

 Meta-markers can be used to design a multi-level MOP  Example: synchronisation facet for middleware determinism

interface MetaRequestLifecycle { ... /** Synchronisation **/ requestBeforeContentionPoint (RequestID, RequestContentionPoint); requestAfterContentionPoint (RequestID, RequestContentionPoint); };

 Two issues to be solved by meta-markers:

 P1: the global semantic context of mutex creation must be

captured by meta-markers

 P2: meta-markers must insure a correct instrumentation of

the selected mutexes

slide-18
SLIDE 18

18

init_and_run_middleware(..) { init_request_queue(..) ; init_some_refcount_object(..) ; ... run_ORB(); }

Capturing Semantics

 Problem P1 is solved by source code annotation of

semantic joint points: Mutexes creates here are relevant for determinism Mutexes creates here are not. init_and_run_middleware(..) { MutexesAreRelevant metaMarker() ; metaMarker.attachToThread() ; init_request_queue(..) ; metaMarker.detachFromThread() ; init_some_refcount_object(..) ; ... run_ORB(); }

slide-19
SLIDE 19

19

meta-marker creates new mutex and attaches it to a meta-mutex

meta-level base level

middleware OS

thread execution path meta marker

MutexesAre Relevant

a new mutex creation is intercepted newly created mutexes are released into the OS among other non-instrumented mutexes mutex

Meta-Markers as Meta-Mutex Factories

meta-mutex

slide-20
SLIDE 20

20

Back to the Meta-Interface

interface MetaRequestLifecycle { /** Communication **/ requestHasBeenReceived (RequestID); replyHasBeenSent (RequestID); /** Control Path **/ requestBeforeApplication (RequestID); requestAfterApplication (RequestID); /** Synchronisation **/ requestBeforeContentionPoint (RequestID, RequestContentionPoint); requestAfterContentionPoint (RequestID, RequestContentionPoint); };

meta-markers to instrument appropriate mutexes meta-markers to instrument appropriate sockets meta-markers to transport request IDs

slide-21
SLIDE 21

21

Implementing Multi-Level Reflection

 Goal: To provide a multi-reflective framework for the fault-

tolerance of complex, non-reflective industrial platforms

 Challenges:

 Requirements: What kind of information is needed for fault

tolerant mechanisms? Where should this information be found?

 Design: How to design a multi-level meta-object protocol

that supports multi-level reflection?

 Instrumentation: How to instrument an industrial, non-

reflective platform in a non-invasive, transparent way?

slide-22
SLIDE 22

22

Implementation

 Multilevel interception framework

to control non-determinism; 8000 LoC C++; based on CORBA and POSIX only; platform independent.

Linux ORBacus application

mutex, thread, socket request, contention point interception use dependencies ML-coordination

replication

slide-23
SLIDE 23

23

Case Study: Orbacus

 Behavioural analysis: a

reverse engineering tool dedicated to complex multi- layer systems

class thread creation

  • bject creation

method call RequestAfterApplication RequestBeforeApplication RequestContentionPoint

 This analysis indicates

where to annotate the source code

 Instrumentation of ORBacus

35 lines added  0,02 % of original code!

(> 100 000 LoC)  Very low intrusiveness

0,02 % of original code

 Highly efficient : number

  • f interceptions ÷ 70
slide-24
SLIDE 24

24

Conclusion

 Tension between comprehensive and adaptable fault-

tolerance, and the multi-component and multi-layered nature of modern complex software systems.

 Our proposal to solve this conflict :

Multi-Level Reflection :

 Combines reflective capabilities found in lower and higher

levels in a global system overview.

 MLR supported by a multi-level MOP based on:

 semantic contexts  meta-markers

 Outlook: Aspect Orientation

 Deep Aspects  Make aspects aware of software “thickness”

slide-25
SLIDE 25

25

Any Questions?

slide-26
SLIDE 26

26

A costly solution, difficult to maintain and adapt

 

 How to add fault-tolerance to complex multi-layered software systems in a transparent and disciplined way?

OS application middleware

Motivation

 Increasingly complex Computer systems (COTS / Layers)

are used for increasingly critical applications.

 Most COTS have not been built with dependability in mind.  Dependability is a system-wide multi-level issue.

fault-tolerance "patches" ad-hoc inter-level coordination ad-hoc connection

  • riginal code ↔ FT patches
slide-27
SLIDE 27

27

 Higher levels :

 Rich semantics  But they lack information / control.

 Lower levels :

 Complete Information / control.  But lacking semantics

Rationale behind Multi-Level Reflection

OS application middleware

 Complex systems contain heterogeneous abstraction levels.

 Available (meta)-information is heterogeneous .

 Complementary roles : lower level information &

control needs to be enriched with higher level semantics

slide-28
SLIDE 28

28

What is Reflection?

 separating fault-tolerance from functional concerns

"the ability of a system to think and act about itself"

  • riginal system

meta-model

"generic connector"

fault-tolerance meta-interfaces meta-level base-level

  • bservation

control

slide-29
SLIDE 29

29

What are Meta-Object Protocols?

A particular way of organising a reflective system

meta-objects base objects MOP

slide-30
SLIDE 30

30

OS application middleware

Reflection & Fault Tolerance

 Reflection has been used to add FT to complex systems but:

 Only one level of abstraction at a time considered so far.

FT FT OS FT middleware

Single-level Reflection  Limited Fault-Tolerance

slide-31
SLIDE 31

31

Motivating Example: Replication & Multithreading

 Goal: Transparent replication of a CORBA server

 multi-layer: POSIX (OS) + CORBA (middleware)  multithreaded: concurrent processing of requests  thread pool: upper limit on concurrency Client CORBA Server OS CORBA OS Replication

slide-32
SLIDE 32

32

Requirements

fault-tolerance meta-model

request reception request in application sending of reply Request Contention Point Request after Application Request before Application request pre-processing request post-processing

...

Reception Start Reception End Reply End Reply Start

... ...

 Example: CORBA Middleware Determinism

slide-33
SLIDE 33

33

internal threading library

OS

middleware new RefCountObject

Semantic Joint Points

pthread_mutex_init()

(mutex creation)

needs not to be replicated must be replicated new RequestQueue The global purpose becomes apparent when backtracking the computation process that causes the low level calls “Semantic joint points”: source code location where global purpose becomes apparent Two low level calls with different semantics

slide-34
SLIDE 34

34

Implementing Multi-Level Reflection

 Goal: To provide a multi-reflective framework for the fault-

tolerance of complex, non-reflective industrial platforms

 Challenges:

 Requirements: What kind of information is needed for fault

tolerant mechanisms? Where should this information be found?

 Design: How to design a multi-level meta-object protocol

that supports multi-level reflection?

 Instrumentation: How to instrument an industrial, non-

reflective platform in a non-invasive, transparent way?