Fault Diagnosis The Virtual Entity (VE) and Virtual Entities in - - PowerPoint PPT Presentation

fault diagnosis
SMART_READER_LITE
LIVE PREVIEW

Fault Diagnosis The Virtual Entity (VE) and Virtual Entities in - - PowerPoint PPT Presentation

The Rainbow System Manager Alarm Correlation Engine What is the RSM Alarm Correlation Engine (RACE)? Fault Diagnosis The Virtual Entity (VE) and Virtual Entities in the RACE design objectives Rainbow System Manager RACE


slide-1
SLIDE 1

Fault Diagnosis and Virtual Entities in the Rainbow System Manager

Tony White, Niall Ross

System and Software Engineering, HK00 HALO2, T4E 742-3848

The Rainbow System Manager Alarm Correlation Engine

  • What is the RSM Alarm Correlation Engine (RACE)?
  • The Virtual Entity (VE)
  • RACE design objectives
  • RACE design description including:

– application architecture – knowledge structures – inferencing mechanisms

  • Example scenario walk-through
  • Summary

The Alarm Correlation Engine

  • The Alarm Correlation Engine takes network event

notifications e.g., alarms and generates a problem stream from it by inferencing over VE correlation communities in the RSM through the use of one

  • r more knowledge bases

Alarm Correlation Engine

Network event notifications Problem stream Knowledge base(s) VE correlation communities

The Virtual Entity

Black Board MA: RDSC RSM Compute Platform RTC Compute Platform M-Protocol Logical VE M-Protocol "Light"

PMO MO

(full)

slide-2
SLIDE 2

Design Objectives

  • Provide an architecture that is flexible i.e., alternate

reasoning paradigms can be easily integrated

  • Generate a rule-based framework capable
  • f having rules encoded directly in the OO language
  • f implementation
  • The design should provide an easily-extensible

framework in order that other NT products’ event correlation needs can be met

  • Provide strongly hierarchical knowledge structuring

mechanisms in order that the scalability and performance issues can be addressed

  • Allow for knowledge reuse between product knowledge

bases and within the elements of a single product knowledge base

Design Philosophy

  • A Problem-based approach is adopted; with a problem

mapping to a fault on a managed object in the network

  • Problem objects communicate with each other with

messages, in well-defined communities

  • Problem objects process messages received from
  • ther problem objects using rules

Problem A Problem B Problem C ruleB1 ruleB2 ruleA1 ruleA2 ... ... ruleC1 ... msgAB msgAC msgCB

The Alarm Correlation Engine is a Hybrid Rule and Message Passing System

Application Architecture

Black Board

MO

Problem Browser Alarm Correlation Engine Production System

MO

Problem Browser Alarm Correlation Engine Symbolic Debugger Verification System Event Generator Event Alarm Correlation Engine Controller Notifications

AC Engine Description: VE class

  • The AC engine requires the extension of the VE

to include a specification of fault behavior

  • Each VE class now has a set of problem classes

associated with it. Only the class names are added to the VE definition and (might) be placed in the ‘F’ area

  • f the FCAPS specification of the VE. An example of the

structure used in the AC engine prototype is shown below

  • e.g. ve_class(lc, [lc_problem_class])
  • NOTE: multiple problem classes can be defined for a

VE class and VE classes can share problem classes Reuse of problem class information is supported by the design

slide-3
SLIDE 3

AC Engine Description: Problem Class

  • Problem classes comprise:

– a name and – an ordered collection of RuleSets

  • RuleSets are used to match rules with messages:

– AlarmNotification(rule_name) – ProblemStateNotification(rule_name) – ProblemNotification(rule_name) – PropositionNotification(rule_name) – DeletedProblemNotification(rule_name)

  • RuleSets are defined for Problem classes

and instances

AC Engine Description: Rule Types

  • Multiple rules can be defined of a given type – the AC

engine evaluates these rules in order when determining the effects of a network event notification or capability change notification

  • Rules can appear as arguments of multiple types

The advantages of this design are:

  • Only rules appropriate to a given class of notification

are evaluated implying improved real time performance

  • The knowledge base designer has control over the
  • rder in which rules are evaluated; the order in which

rules are defined is unimportant – this implies easier maintenance

  • Rules reuse between types and even problem classes

Problem Description: Rule Definition

  • Rules are methods, but with an enhanced Smalltalk

syntax

  • Rules consist of three distinct elements:

– a name – a conjunction of a set of conditions i.e., boolean expressions – a set of actions

  • Any piece of Smalltalk code can be embedded in a rule

– rule actions are not limited in any way

  • The complete power and wealth of the Smalltalk class

library and encoded BNR applications is thus available to the knowledge base designer

RuleBases and CompiledRuleBases

  • RuleBases are classes containing rules, with the rules

being coded as instance and class methods

  • CompiledRuleBases are classes containing compiled

rules, with the compiled rules coded as instance and class methods

  • RuleBase classes form a hierarchy such that rules

in one class can be overloaded in a subclass RuleBase A RuleBase B rule 1, rule 2... rule 2, rule3, ... is subclass of

slide-4
SLIDE 4

The Problem - RuleBase Relationship

Problem A

rule1 rule2

Problem B

rule2 rule3

Problem C

rule3 rule4

RuleBase X

rule1 rule2 rule3 rule4

indirect relationship via ProblemRuleBaseMapper

Correlation Communities

Correlation communities are sets

  • f components that interact in
  • rder to provide some service
  • r services.

Correlation communities can communicate in one of two ways. Firstly, components within the community may post to and read from the community notepad. Secondly, they may communicate with other community members by such interaction paths as are defined. For example, in a capability- managed system, these interac- tion paths would be the capability chain links. Virtual Managed Entity Community NotePad Sonet1 LC1 AX1 AX2 LC2 ATM link1 Interactor chain links Community NotePad link

Legend

Inferencing: Direct Communication

  • Event notification arrives for processing at a VE
  • Problem class or instance rules cause changes

in capability or problem state

  • Changes are propagated to consumers of VE

capabilities which, in turn, report these changes to their immediate capability suppliers

ATM link1 Community NotePad Event LC2 AX2 Sonet1 LC1 AX1

Inferencing: Broadcasting

  • Event arrives at one VE which is indicative of a problem

elsewhere in the capability chain

  • Information is broadcast to all VEs in the community

via the community notepad

Event ATM link1 Community NotePad LC2 AX2 Sonet1 LC1 AX1

slide-5
SLIDE 5

Fault Scenario I

A sonet framing alarm arrives on the rhs LC1 (1) which posts the event on the note pad (2). The SONET and lhs LC1 members fire rules based upon event (3), both creating problem instances and causing problem notifications to be posted on the note pad (4). Upon seeing event (3) the lhs AX1 card stores the fact that the rhs LC1 has seen a sonet failure.

ATM link LC1 LC2 AX1 AX2 AX1 AX2 LC1 LC2 Prot. Prot. ATM module ATM module SONET

1 2 3 3 3 4 4

notepad connections omitted Community NotePad

Fault Scenario II

A sonet framing alarm arrives at rhs LC2 (1) . It posts the event(2). The lhs LC1 fire rules for event (3), creating a problem and causing a problem notifi- cation to be posted on the note pad (4). Upon seeing (3) and “knowing of the previous sonet alarm” the lhs AX1 generates a problem instance causing a problem notification to be posted on the note pad (4). Also, LC2 generates a problem instance, causing a problem notification (4). The AX1 problem noti- fication (5) is seen by LC1 and LC2 firing rules that cause the deletion of prob- lems on those two components.

ATM link LC1 LC2 AX1 AX2 AX1 AX2 LC1 LC2 Prot. Prot. ATM module ATM module SONET

1 3 3 2 3 4 4 5 5

notepad connections omitted Community NotePad

Summary

  • An implementation of an alarm correlation engine

capable of complex alarm correlation has been achieved

  • A hybrid rule and message passing system

has been built

  • Complete separation of problem and rule knowledge

has been achieved

  • Problem class and rule reuse are supported